Mask2Former / datasets /README.md
Ahsen Khaliq
add files
16aee22
|
raw
history blame
5.3 kB
# Prepare Datasets for Mask2Former
A dataset can be used by accessing [DatasetCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.DatasetCatalog)
for its data, or [MetadataCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.MetadataCatalog) for its metadata (class names, etc).
This document explains how to setup the builtin datasets so they can be used by the above APIs.
[Use Custom Datasets](https://detectron2.readthedocs.io/tutorials/datasets.html) gives a deeper dive on how to use `DatasetCatalog` and `MetadataCatalog`,
and how to add new datasets to them.
MaskFormer has builtin support for a few datasets.
The datasets are assumed to exist in a directory specified by the environment variable
`DETECTRON2_DATASETS`.
Under this directory, detectron2 will look for datasets in the structure described below, if needed.
```
$DETECTRON2_DATASETS/
ADEChallengeData2016/
coco/
cityscapes/
mapillary_vistas/
```
You can set the location for builtin datasets by `export DETECTRON2_DATASETS=/path/to/datasets`.
If left unset, the default is `./datasets` relative to your current working directory.
The [model zoo](https://github.com/facebookresearch/MaskFormer/blob/master/MODEL_ZOO.md)
contains configs and models that use these builtin datasets.
## Expected dataset structure for [COCO](https://cocodataset.org/#download):
```
coco/
annotations/
instances_{train,val}2017.json
panoptic_{train,val}2017.json
{train,val}2017/
# image files that are mentioned in the corresponding json
panoptic_{train,val}2017/ # png annotations
panoptic_semseg_{train,val}2017/ # generated by the script mentioned below
```
Install panopticapi by:
```
pip install git+https://github.com/cocodataset/panopticapi.git
```
Then, run `python datasets/prepare_coco_semantic_annos_from_panoptic_annos.py`, to extract semantic annotations from panoptic annotations (only used for evaluation).
## Expected dataset structure for [cityscapes](https://www.cityscapes-dataset.com/downloads/):
```
cityscapes/
gtFine/
train/
aachen/
color.png, instanceIds.png, labelIds.png, polygons.json,
labelTrainIds.png
...
val/
test/
# below are generated Cityscapes panoptic annotation
cityscapes_panoptic_train.json
cityscapes_panoptic_train/
cityscapes_panoptic_val.json
cityscapes_panoptic_val/
cityscapes_panoptic_test.json
cityscapes_panoptic_test/
leftImg8bit/
train/
val/
test/
```
Install cityscapes scripts by:
```
pip install git+https://github.com/mcordts/cityscapesScripts.git
```
Note: to create labelTrainIds.png, first prepare the above structure, then run cityscapesescript with:
```
CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesscripts/preparation/createTrainIdLabelImgs.py
```
These files are not needed for instance segmentation.
Note: to generate Cityscapes panoptic dataset, run cityscapesescript with:
```
CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesscripts/preparation/createPanopticImgs.py
```
These files are not needed for semantic and instance segmentation.
## Expected dataset structure for [ADE20k](http://sceneparsing.csail.mit.edu/):
```
ADEChallengeData2016/
images/
annotations/
objectInfo150.txt
# download instance annotation
annotations_instance/
# generated by prepare_ade20k_sem_seg.py
annotations_detectron2/
# below are generated by prepare_ade20k_pan_seg.py
ade20k_panoptic_{train,val}.json
ade20k_panoptic_{train,val}/
# below are generated by prepare_ade20k_ins_seg.py
ade20k_instance_{train,val}.json
```
The directory `annotations_detectron2` is generated by running `python datasets/prepare_ade20k_sem_seg.py`.
Install panopticapi by:
```bash
pip install git+https://github.com/cocodataset/panopticapi.git
```
Download the instance annotation from http://sceneparsing.csail.mit.edu/:
```bash
wget http://sceneparsing.csail.mit.edu/data/ChallengeData2017/annotations_instance.tar
```
Then, run `python datasets/prepare_ade20k_pan_seg.py`, to combine semantic and instance annotations for panoptic annotations.
And run `python datasets/prepare_ade20k_ins_seg.py`, to extract instance annotations in COCO format.
## Expected dataset structure for [Mapillary Vistas](https://www.mapillary.com/dataset/vistas):
```
mapillary_vistas/
training/
images/
instances/
labels/
panoptic/
validation/
images/
instances/
labels/
panoptic/
mapillary_vistas_instance_{train,val}.json # generated by the script mentioned below
```
No preprocessing is needed for Mapillary Vistas on semantic and panoptic segmentation.
If you want to evaluate instance segmentation on Mapillary Vistas, run `python datasets/prepare_mapillary_vistas_ins_seg.py` to generate COCO-style instance annotations.
## Expected dataset structure for [YouTubeVIS 2019](https://competitions.codalab.org/competitions/20128):
```
ytvis_2019/
{train,valid,test}.json
{train,valid,test}/
Annotations/
JPEGImages/
```
## Expected dataset structure for [YouTubeVIS 2021](https://competitions.codalab.org/competitions/28988):
```
ytvis_2021/
{train,valid,test}.json
{train,valid,test}/
Annotations/
JPEGImages/
```