nanosam / README.md

Update README.md

5f8531e 8 months ago

7.67 kB

	---
	license: apache-2.0
	pipeline_tag: mask-generation
	---

	# NanoSAM: Accelerated Segment Anything Model for Edge deployment

	- [GitHub](https://github.com/binh234/nanosam)
	- [Demo](https://huggingface.co/spaces/dragonSwing/nanosam)

	## Pretrained Models

	NanoSAM performance on edge devices. Latency/throughput is measured on NVIDIA Jetson Xavier NX, and NVIDIA T4 GPU with TensorRT, fp16. Data transfer time is included.

	<table style="border-top: solid 1px; border-left: solid 1px; border-right: solid 1px; border-bottom: solid 1px">
	<thead>
	<tr>
	<th rowspan=2 style="text-align: center; border-right: solid 1px">Model †</th>
	<th colspan=2 style="text-align: center; border-right: solid 1px">:stopwatch: CPU (ms)</th>
	<th colspan=2 style="text-align: center; border-right: solid 1px">:stopwatch: Jetson Xavier NX (ms)</th>
	<th colspan=2 style="text-align: center; border-right: solid 1px">:stopwatch: T4 (ms)</th>
	<th rowspan=2 style="text-align: center; border-right: solid 1px">Model Size</th>
	<th rowspan=2 style="text-align: center; border-right: solid 1px">Link</th>
	</tr>
	<tr>
	<th style="text-align: center; border-right: solid 1px">Image Encoder</th>
	<th style="text-align: center; border-right: solid 1px">Full Pipeline</th>
	<th style="text-align: center; border-right: solid 1px">Image Encoder</th>
	<th style="text-align: center; border-right: solid 1px">Full Pipeline</th>
	<th style="text-align: center; border-right: solid 1px">Image Encoder</th>
	<th style="text-align: center; border-right: solid 1px">Full Pipeline</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td style="text-align: center; border-right: solid 1px">PPHGV2-SAM-B1</td>
	<td style="text-align: center; border-right: solid 1px">110ms</td>
	<td style="text-align: center; border-right: solid 1px">180ms</td>
	<td style="text-align: center; border-right: solid 1px">9.6ms</td>
	<td style="text-align: center; border-right: solid 1px">17ms</td>
	<td style="text-align: center; border-right: solid 1px">2.4ms</td>
	<td style="text-align: center; border-right: solid 1px">5.8ms</td>
	<td style="text-align: center; border-right: solid 1px">12.1MB</td>
	<td style="text-align: center; border-right: solid 1px"><a href="https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b1_ln_nonorm_image_encoder.onnx">Link</a></td>
	</tr>
	<tr>
	<td style="text-align: center; border-right: solid 1px">PPHGV2-SAM-B2</td>
	<td style="text-align: center; border-right: solid 1px">200ms</td>
	<td style="text-align: center; border-right: solid 1px">270ms</td>
	<td style="text-align: center; border-right: solid 1px">12.4ms</td>
	<td style="text-align: center; border-right: solid 1px">19.8ms</td>
	<td style="text-align: center; border-right: solid 1px">3.2ms</td>
	<td style="text-align: center; border-right: solid 1px">6.4ms</td>
	<td style="text-align: center; border-right: solid 1px">28.1MB</td>
	<td style="text-align: center; border-right: solid 1px"><a href="https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b4_ln_nonorm_image_encoder.onnx">Link</a></td>
	</tr>
	<tr>
	<td style="text-align: center; border-right: solid 1px">PPHGV2-SAM-B4</td>
	<td style="text-align: center; border-right: solid 1px">300ms</td>
	<td style="text-align: center; border-right: solid 1px">370ms</td>
	<td style="text-align: center; border-right: solid 1px">17.3ms</td>
	<td style="text-align: center; border-right: solid 1px">24.7ms</td>
	<td style="text-align: center; border-right: solid 1px">4.1ms</td>
	<td style="text-align: center; border-right: solid 1px">7.5ms</td>
	<td style="text-align: center; border-right: solid 1px">58.6MB</td>
	<td style="text-align: center; border-right: solid 1px"><a href="https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b4_ln_nonorm_image_encoder.onnx">Link</a></td>
	</tr>
	<tr>
	<td style="text-align: center; border-right: solid 1px">NanoSAM (ResNet18)</td>
	<td style="text-align: center; border-right: solid 1px">500ms</td>
	<td style="text-align: center; border-right: solid 1px">570ms</td>
	<td style="text-align: center; border-right: solid 1px">22.4ms</td>
	<td style="text-align: center; border-right: solid 1px">29.8ms</td>
	<td style="text-align: center; border-right: solid 1px">5.8ms</td>
	<td style="text-align: center; border-right: solid 1px">9.2ms</td>
	<td style="text-align: center; border-right: solid 1px">60.4MB</td>
	<td style="text-align: center; border-right: solid 1px"><a href="https://drive.google.com/file/d/14-SsvoaTl-esC3JOzomHDnI9OGgdO2OR/view?usp=drive_link">Link</a></td>
	</tr>
	<tr>
	<td style="text-align: center; border-right: solid 1px">EfficientViT-SAM-L0</td>
	<td style="text-align: center; border-right: solid 1px">1s</td>
	<td style="text-align: center; border-right: solid 1px">1.07s</td>
	<td style="text-align: center; border-right: solid 1px">31.6ms</td>
	<td style="text-align: center; border-right: solid 1px">38ms</td>
	<td style="text-align: center; border-right: solid 1px">6ms</td>
	<td style="text-align: center; border-right: solid 1px">9.4ms</td>
	<td style="text-align: center; border-right: solid 1px">117.4MB</td>
	<td style="text-align: center; border-right: solid 1px"></td>
	</tr>
	</tbody>
	</table>

	Zero-Shot Instance Segmentation on COCO2017 validation dataset

	\| Image Encoder \| mAP<sup>mask<br>50-95 \| mIoU (all) \| mIoU (large) \| mIoU (medium) \| mIoU (small) \|
	\| --------------- \| :-------------------: \| :--------: \| :----------: \| :-----------: \| :----------: \|
	\| ResNet18 \| - \| 70.6 \| 79.6 \| 73.8 \| 62.4 \|
	\| MobileSAM \| - \| 72.8 \| 80.4 \| 75.9 \| 65.8 \|
	\| PPHGV2-B1 \| 41.2 \| 75.6 \| 81.2 \| 77.4 \| 70.8 \|
	\| PPHGV2-B2 \| 42.6 \| 76.5 \| 82.2 \| 78.5 \| 71.5 \|
	\| PPHGV2-B4 \| 44.0 \| 77.3 \| 83.0 \| 79.7 \| 72.1 \|
	\| EfficientViT-L0 \| 45.6 \| 78.6 \| 83.7 \| 81.0 \| 73.3 \|

	## Usage

	```python3
	from nanosam.utils.predictor import Predictor

	image_encoder_cfg = {
	"path": "data/sam_hgv2_b4_ln_nonorm_image_encoder.onnx",
	"name": "OnnxModel",
	"provider": "cpu",
	"normalize_input": False,
	}
	mask_decoder_cfg = {
	"path": "data/efficientvit_l0_mask_decoder.onnx",
	"name": "OnnxModel",
	"provider": "cpu",
	}
	predictor = Predictor(encoder_cfg, decoder_cfg)

	image = PIL.Image.open("assets/dogs.jpg")

	predictor.set_image(image)

	mask, _, _ = predictor.predict(np.array([[x, y]]), np.array([1]))
	```

	The point labels may be

	\| Point Label \| Description \|
	\| :---------: \| ------------------------- \|
	\| 0 \| Background point \|
	\| 1 \| Foreground point \|
	\| 2 \| Bounding box top-left \|
	\| 3 \| Bounding box bottom-right \|