File size: 4,966 Bytes
ef93de3
 
b01b5ce
ffec552
 
 
 
b01b5ce
ffec552
 
b01b5ce
 
ef93de3
811d1cb
 
e8d5673
811d1cb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e8d5673
811d1cb
 
 
 
 
 
 
 
 
 
 
 
e8d5673
811d1cb
e8d5673
 
 
 
 
 
 
 
 
 
 
 
 
811d1cb
 
 
 
 
 
 
 
 
 
 
 
e8d5673
811d1cb
e8d5673
811d1cb
e8d5673
 
 
 
 
 
 
 
 
 
 
 
 
811d1cb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
license: apache-2.0
library_name: tfhub
tags:
- vision
- image-classification
- mobilenet
- tensorflow
datasets:
- imagenet-1k
metrics:
- accuracy
---

## Model name: mobilenet_v3_small_100_224
## Description adapted from [TFHub](https://tfhub.dev/google/imagenet/mobilenet_v3_small_100_224/classification/5)

# Overview

MobileNet V3 is a family of neural network architectures for efficient on-device image classification and related tasks, originally published by

- Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam: ["Searching for MobileNetV3"](https://arxiv.org/abs/1905.02244), 2019.

Similar to other Mobilenets, MobileNet V3 uses a multiplier for the depth (number of features) in the convolutional layers to tune the accuracy vs. latency tradeoff. In addition, MobileNet V3 comes in two different sizes, small and large, to adapt the network to low or high resource use cases. Although V3 networks can be built with custom input resolutions, just like other Mobilenets, all pre-trained checkpoints were published with the same 224x224 input resolution.

For a quick comparison between these variants, please refer to the following table:

|Size|Depth multiplier|Top1 accuracy (%)|Pixel 1 (ms)|Pixel 2 (ms)|Pixel 3 (ms)|
|----|----------------|-----------------|------------|------------|------------|
|Large|1.0|75.2|51.2|61|44|
|Large|0.75|73.3|39.8|48|32|
|Small|1.0|67.5|15.8|19.4|14.4|
|Small|0.75|65.4|12.8|15.9|11.6|

This model uses the TF-Slim implementation of [`mobilenet_v3`](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet_v3.py) as a small network with a depth multiplier of 1.0.

The model contains a trained instance of the network, packaged to do the [image classification](https://www.tensorflow.org/hub/common_signatures/images#classification) that the network was trained on. If you merely want to transform images into feature vectors, use [`google/imagenet/mobilenet_v3_small_100_224/feature_vector/5`](https://tfhub.dev/google/imagenet/mobilenet_v3_small_100_224/feature_vector/5) instead, and save the space occupied by the classification layer.


# Training

The checkpoint exported into this model was `v3-small_224_1.0_float/ema/model-388500` downloaded from [MobileNet V3 pre-trained](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/README.md) models. Its weights were originally obtained by training on the ILSVRC-2012-CLS dataset for image classification ("Imagenet").

# Usage

This model can be used with the `hub.KerasLayer` as follows. It cannot be used with the `hub.Module` API for TensorFlow 1.

### Using TF Hub and HF Hub
```
model_path = snapshot_download(repo_id="Dimitre/mobilenet_v3_small")
model =  KerasLayer(handle=model_path)

img = np.random.rand(1, 224, 224, 3) # (batch_size, height, width, num_channels)
model(img) # output shape (1, 1001)
```

### Using [TF Hub fork](https://github.com/dimitreOliveira/hub)
```
model = pull_from_hub(repo_id="Dimitre/mobilenet_v3_small")

img = np.random.rand(1, 224, 224, 3) # (batch_size, height, width, num_channels)
model(img) # output shape (1, 1001)
```

The output is a batch of logits vectors. The indices into the logits are the `num_classes` = 1001 classes of the classification from the original training (see above). The mapping from indices to class labels can be found in the file at [download.tensorflow.org/data/ImageNetLabels.txt](https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt) (with class 0 for "background", followed by 1000 actual ImageNet classes).

The input images are expected to have color values in the range [0,1], following the [common image input](https://www.tensorflow.org/hub/common_signatures/images#input) conventions. For this model, the size of the input images is fixed to `height` x `width` = 224 x 224 pixels.

# Fine-tuning

In principle, consumers of this model can [fine-tune](https://www.tensorflow.org/hub/tf2_saved_model#fine-tuning) it by passing `trainable=True` to `hub.KerasLayer`.

However, fine-tuning through a large classification might be prone to overfit.

The momentum (a.k.a. decay coefficient) of batch norm's exponential moving averages defaults to 0.99 for this model, in order to accelerate training on small datasets (or with huge batch sizes).

### Using TF Hub and HF Hub
```
model_path = snapshot_download(repo_id="Dimitre/mobilenet_v3_small")
model =  KerasLayer(handle=model_path, trainable=True)

img = np.random.rand(1, 224, 224, 3) # (batch_size, height, width, num_channels)
model(img) # output shape (1, 1001)
```

### Using [TF Hub fork](https://github.com/dimitreOliveira/hub)
```
model = pull_from_hub(repo_id="Dimitre/mobilenet_v3_small", trainable=True)

img = np.random.rand(1, 224, 224, 3) # (batch_size, height, width, num_channels)
model(img) # output shape (1, 1001)
```