{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MobileSAM Adapter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A Quick Overview\n",
    "\n",
    "\n",
    "What makes SegAny slow for [SAM](https://github.com/facebookresearch/segment-anything) is its heavyweight image encoder, which has been addressed by [MobileSAM](https://github.com/ChaoningZhang/MobileSAM) via decoupled knowledge distillation. The efficiency bottleneck of SegEvery with SAM, however, lies in its mask decoder because it needs to first generate numerous masks with redundant grid-search prompts and then perform filtering to obtain the final valid masks. MobileSAM propose object-aware box prompts to replace the default grid-search point prompts, which significantly increases its speed while achieving overall superior performance.\n",
    "\n",
    "Our goal is to integrate medical specific domain knowledge into the lightweight MobileSAM model through adaptation technique. Therefore, we only utilize the pre-trained MobileSAM weights. Like our original [Medical SAM Adapter](https://arxiv.org/abs/2304.12620), we achieve efficient migration from the original SAM to medical images by adding simple Adapters in the network."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training\n",
    "We have unified the interfaces of MobileSAM, and training MobileSAM Adapter can be done using the same command as the SAM adapter:\n",
    "\n",
    "``python train.py -net mobile_sam -data_path data/isic -sam_ckpt checkpoint/mobile_sam/mobile_sam.pt -image_size 1024 -vis 100 -mod sam_adpt -encoder tiny_vit``\n",
    "\n",
    "You can use the ``-encode`` option to specify the encoders supported by MobileSAM, such as Vit, Tiny_Vit, and Efficient_Vit(wwhich doe not support adapter now).\n",
    "The pretrained weight of MobileSAM can be downloaded [here](https://drive.google.com/file/d/1dE-YAG-1mFCBmao2rHDp0n-PP4eH7SjE/view).\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performance VS SAM \n",
    "**Setup**: Using a single Nvidia RTX 3090 GPU, the batch_size was set to 2. The resolution of input image is 1024.\n",
    "\n",
    "#### ISIC\n",
    "| Baseline     | Backbone  | DICE   | mIOU | Memory  |\n",
    "| ------------ | --------- | ------ | ---- | ------- |\n",
    "| SAM          | VIT-Base  | 0.9225 | 0.8646 | 22427 M |\n",
    "| EfficientSAM | VIT-Small | 0.9091 | 0.8463 | 21275 M  |\n",
    "| EfficientSAM | VIT-Tiny  | 0.9095 | 0.8437  |  15713 M  |\n",
    "| MobileSAM | Tiny-Vit | 0.9225 | 0.8651 | 10255M |\n",
    "\n",
    "#### REFUGE\n",
    "| Baseline     | Backbone  | DICE   | mIOU | Memory  |\n",
    "| ------------ | --------- | ------ | ---- | ------- |\n",
    "| SAM          | VIT-Base  | 0.9085 | 0.8423 | 22427 M |\n",
    "| EfficientSAM | VIT-Small | 0.8691 | 0.7915 | 21275 M  |\n",
    "| EfficientSAM | VIT-Tiny  | 0.7999 | 0.6949 |  15713 M  |\n",
    "| MobileSAM | Tiny-Vit | 0.9330 | 0.8812 | 10255M |\n",
    "\n",
    "In the case of training for only 100 epochs, MobileSAM which takes tiny_vit as the ImageEncoder even **outperforms** SAM and EfficientSAM on REFUGE and ISIC datasets, while consuming **less** GPU memory."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performance Under different resolution \n",
    "**Setup**: Using a single Nvidia RTX 3090 GPU, the batch_size was set to 2. Using Tiny-Vit as the backbone.\n",
    "\n",
    "#### ISIC\n",
    "| Baseline     | Resolution  | DICE   | mIOU | Memory  |\n",
    "| ------------ | --------- | ------ | ---- | ------- |\n",
    "| MobileSAM | 1024 | 0.9225 | 0.8651 | 10255M |\n",
    "| MobileSAM | 512 | 0.9177 | 0.8579 | 4075M |\n",
    "| MobileSAM | 256        | 0.9145 | 0.8508 | 2517M |\n",
    "\n",
    "#### REFUGE\n",
    "| Baseline  | Resolution | DICE   | mIOU   | Memory |\n",
    "| --------- | ---------- | ------ | ------ | ------ |\n",
    "| MobileSAM | 1024       | 0.9225 | 0.8651 | 10255M |\n",
    "| MobileSAM | 512        | 0.9200 | 0.8562 | 4075M  |\n",
    "| MobileSAM | 256        | 0.8294 | 0.7297 | 2517M  |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The loss decreasing curve and the performance curve.\n",
    "### Backbone： Tiny-Vit\n",
    "\n",
    "### ISIC\n",
    "<p float=\"left\">\n",
    "  <img src=\"../figs/MobileSAM/MobileSAM-Ti (ISIC)_loss.png\" width=\"400\" />\n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "  <img src=\"../figs/MobileSAM/MobileSAM-Ti (ISIC)_performance.png\" width=\"400\" /> \n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "</p>\n",
    "\n",
    "### REFUGE\n",
    "<p float=\"left\">\n",
    "  <img src=\"../figs/MobileSAM/MobileSAM-Ti (REFUGE)_loss.png\" width=\"400\" />\n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "  <img src=\"../figs/MobileSAM/MobileSAM-Ti (REFUGE)_performance.png\" width=\"400\" /> \n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "</p>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.16 ('general')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.7.16"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "e7f99538a81e8449c1b1a4a7141984025c678b5d9c33981aa2a3c129d8e1c90d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}