Alara Dirik
commited on
Commit
•
e81946e
1
Parent(s):
b16489b
Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,8 @@
|
|
1 |
---
|
|
|
2 |
tags:
|
3 |
- vision
|
|
|
4 |
---
|
5 |
|
6 |
# Model Card: OWL-ViT
|
@@ -77,3 +79,14 @@ We primarily imagine the model will be used by researchers to better understand
|
|
77 |
## Data
|
78 |
|
79 |
The CLIP backbone of the model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as [YFCC100M](http://projects.dfki.uni-kl.de/yfcc100m/). A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet. The prediction heads of OWL-ViT, along with the CLIP backbone, are fine-tuned on publicly available object detection datasets such as [COCO](https://cocodataset.org/#home) and [OpenImages](https://storage.googleapis.com/openimages/web/index.html).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: apache-2.0
|
3 |
tags:
|
4 |
- vision
|
5 |
+
- object-detection
|
6 |
---
|
7 |
|
8 |
# Model Card: OWL-ViT
|
|
|
79 |
## Data
|
80 |
|
81 |
The CLIP backbone of the model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as [YFCC100M](http://projects.dfki.uni-kl.de/yfcc100m/). A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet. The prediction heads of OWL-ViT, along with the CLIP backbone, are fine-tuned on publicly available object detection datasets such as [COCO](https://cocodataset.org/#home) and [OpenImages](https://storage.googleapis.com/openimages/web/index.html).
|
82 |
+
|
83 |
+
### BibTeX entry and citation info
|
84 |
+
|
85 |
+
```bibtex
|
86 |
+
@article{minderer2022simple,
|
87 |
+
title={Simple Open-Vocabulary Object Detection with Vision Transformers},
|
88 |
+
author={Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby},
|
89 |
+
journal={arXiv preprint arXiv:2205.06230},
|
90 |
+
year={2022},
|
91 |
+
}
|
92 |
+
```
|