NimaBoscarino commited on
Commit
2a03f49
·
1 Parent(s): ae54f3c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -64
README.md CHANGED
@@ -1,22 +1,16 @@
1
  ---
2
- language:
3
- - en
4
  license: mit
5
  tags:
6
  - object-detection
7
  - object-tracking
8
  - video
9
  - video-object-segmentation
10
- datasets:
11
- - imagenet-1k
12
- metrics:
13
- - accuracy
14
  ---
15
 
16
- # Unicorn (TO DO)
17
 
18
  ## Table of Contents
19
- - [EfficientFormer-L3](#-model_id--defaultmymodelname-true)
20
  - [Table of Contents](#table-of-contents)
21
  - [Model Details](#model-details)
22
  - [How to Get Started with the Model](#how-to-get-started-with-the-model)
@@ -37,76 +31,37 @@ metrics:
37
 
38
  ## Model Details
39
 
40
- <!-- Give an overview of your model, the relevant research paper, who trained it, etc. -->
41
 
42
- EfficientFormer-L3, developed by [Snap Research](https://github.com/snap-research), is one of three EfficientFormer models. The EfficientFormer models were released as part of an effort to prove that properly designed transformers can reach extremely low latency on mobile devices while maintaining high performance.
43
-
44
- This checkpoint of EfficientFormer-L3 was trained for 300 epochs.
45
-
46
- - Developed by: Yanyu Li, Geng Yuan, Yang Wen, Eric Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren
47
- - Language(s): English
48
  - License: This model is licensed under the apache-2.0 license
49
  - Resources for more information:
50
- - [Research Paper](https://arxiv.org/abs/2206.01191)
51
- - [GitHub Repo](https://github.com/snap-research/EfficientFormer/)
52
 
53
  </model_details>
54
 
55
- <how_to_start>
56
-
57
- ## How to Get Started with the Model
58
-
59
- Use the code below to get started with the model.
60
-
61
- ```python
62
- # A nice code snippet here that describes how to use the model...
63
- ```
64
- </how_to_start>
65
-
66
  <uses>
67
 
68
  ## Uses
69
 
70
  #### Direct Use
71
 
72
- This model can be used for image classification and semantic segmentation. On mobile devices (the model was tested on iPhone 12), the CoreML checkpoints will perform these tasks with low latency.
73
-
74
- <Limitations_and_Biases>
75
-
76
- ## Limitations and Biases
77
-
78
- Though most designs in EfficientFormer are general-purposed, e.g., dimension- consistent design and 4D block with CONV-BN fusion, the actual speed of EfficientFormer may vary on other platforms. For instance, if GeLU is not well supported while HardSwish is efficiently implemented on specific hardware and compiler, the operator may need to be modified accordingly. The proposed latency-driven slimming is simple and fast. However, better results may be achieved if search cost is not a concern and an enumeration-based brute search is performed.
79
 
80
- Since the model was trained on Imagenet-1K, the [biases embedded in that dataset](https://huggingface.co/datasets/imagenet-1k#considerations-for-using-the-data) will be reflected in the EfficientFormer models.
81
-
82
- </Limitations_and_Biases>
83
-
84
- <Training>
85
-
86
- ## Training
87
-
88
- #### Training Data
89
-
90
- This model was trained on ImageNet-1K.
91
-
92
- See the [data card](https://huggingface.co/datasets/imagenet-1k) for additional information.
93
-
94
- #### Training Procedure
95
-
96
- * Parameters: 31.3 M
97
- * GMACs: 3.9
98
- * Train. Epochs: 300
99
-
100
- Trained on a cluster with NVIDIA A100 and V100 GPUs.
101
-
102
- </Training>
103
 
104
  <Eval_Results>
105
 
106
  ## Evaluation Results
107
 
108
- Top-1 Accuracy: 82.4% on ImageNet 10K
109
- Latency: 3.0 ms
 
 
 
110
 
111
  </Eval_Results>
112
 
@@ -115,10 +70,10 @@ Latency: 3.0 ms
115
  ## Citation Information
116
 
117
  ```bibtex
118
- @article{li2022efficientformer,
119
- title={EfficientFormer: Vision Transformers at MobileNet Speed},
120
- author={Li, Yanyu and Yuan, Geng and Wen, Yang and Hu, Eric and Evangelidis, Georgios and Tulyakov, Sergey and Wang, Yanzhi and Ren, Jian},
121
- journal={arXiv preprint arXiv:2206.01191},
122
  year={2022}
123
  }
124
  ```
 
1
  ---
 
 
2
  license: mit
3
  tags:
4
  - object-detection
5
  - object-tracking
6
  - video
7
  - video-object-segmentation
 
 
 
 
8
  ---
9
 
10
+ # unicorn_track_large_mask
11
 
12
  ## Table of Contents
13
+ - [unicorn_track_large_mask](#-model_id--defaultmymodelname-true)
14
  - [Table of Contents](#table-of-contents)
15
  - [Model Details](#model-details)
16
  - [How to Get Started with the Model](#how-to-get-started-with-the-model)
 
31
 
32
  ## Model Details
33
 
34
+ Unicorn accomplishes the great unification of the network architecture and the learning paradigm for four tracking tasks. Unicorn puts forwards new state-of-the-art performance on many challenging tracking benchmarks using the same model parameters. This model has an input size of 800x1280.
35
 
 
 
 
 
 
 
36
  - License: This model is licensed under the apache-2.0 license
37
  - Resources for more information:
38
+ - [Research Paper](https://arxiv.org/abs/2111.12085)
39
+ - [GitHub Repo](https://github.com/MasterBin-IIAU/Unicorn)
40
 
41
  </model_details>
42
 
 
 
 
 
 
 
 
 
 
 
 
43
  <uses>
44
 
45
  ## Uses
46
 
47
  #### Direct Use
48
 
49
+ This model can be used for:
 
 
 
 
 
 
50
 
51
+ * Single Object Tracking (SOT)
52
+ * Multiple Object Tracking (MOT)
53
+ * Video Object Segmentation (VOS)
54
+ * Multi-Object Tracking and Segmentation (MOTS)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
  <Eval_Results>
57
 
58
  ## Evaluation Results
59
 
60
+ LaSOT AUC (%): 68.5
61
+ BDD100K mMOTA (%): 41.2
62
+ DAVIS17 J&F (%): 69.2
63
+ BDD100K MOTS mMOTSA (%): 29.6
64
+
65
 
66
  </Eval_Results>
67
 
 
70
  ## Citation Information
71
 
72
  ```bibtex
73
+ @inproceedings{unicorn,
74
+ title={Towards Grand Unification of Object Tracking},
75
+ author={Yan, Bin and Jiang, Yi and Sun, Peize and Wang, Dong and Yuan, Zehuan and Luo, Ping and Lu, Huchuan},
76
+ booktitle={ECCV},
77
  year={2022}
78
  }
79
  ```