Vishakaraj's picture
Upload folder using huggingface_hub
c709b60
# Dataset preparation
## COCO Dataset
- Download the coco 2017 dataset from the [official website](https://cocodataset.org/#download).
Dataset strcture should look like:
~~~
${GRiT_ROOT}
|-- datasets
`-- |-- coco
|-- |-- train2017/
|-- |-- val2017/
|-- |-- test2017/
|-- |-- annotations/
|-- |-- |-- instances_train2017.json
|-- |-- |-- instances_val2017.json
|-- |-- |-- image_info_test-dev2017.json
~~~
## VG Dataset
- Download images from [official website](https://visualgenome.org/api/v0/api_home.html)
- Download our pre-processed annotations:
[train.json](https://datarelease.blob.core.windows.net/grit/VG_preprocessed_annotations/train.json) and
[test.json](https://datarelease.blob.core.windows.net/grit/VG_preprocessed_annotations/test.json)
Dataset strcture should look like:
~~~
${GRiT_ROOT}
|-- datasets
`-- |-- vg
|-- |-- images/
|-- |-- annotations/
|-- |-- |-- train.json
|-- |-- |-- test.json
~~~
## References
Please cite the corresponding references if you use the datasets.
~~~
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
@article{krishna2017visual,
title={Visual genome: Connecting language and vision using crowdsourced dense image annotations},
author={Krishna, Ranjay and Zhu, Yuke and Groth, Oliver and Johnson, Justin and Hata, Kenji and Kravitz, Joshua and Chen, Stephanie and Kalantidis, Yannis and Li, Li-Jia and Shamma, David A and others},
journal={International journal of computer vision},
volume={123},
number={1},
pages={32--73},
year={2017},
publisher={Springer}
}
~~~