MLR-Copilot / benchmarks /fathomnet /env /data_description.txt
Lim0011's picture
Upload 251 files
85e3d20 verified
raw
history blame
3.15 kB
Dataset Description
The training and test images for the competition were all collected in the Monterey Bay Area between the surface and 1300 meters depth. The images contain bounding box annotations of 290 categories of bottom dwelling animals. The training and evaluation data are split across an 800 meter depth threshold: all training data is collected from 0-800 meters, evaluation data comes from the whole 0-1300 meter range. Since an organisms' habitat range is partially a function of depth, the species distributions in the two regions are overlapping but not identical. Test images are drawn from the same region but may come from above or below the depth horizon.
The competition goal is to label the animals present in a given image (i.e. multi-label classification) and determine whether the image is out-of-sample.
Data format
The training dataset are provide in two different formats: multi-label classification and object detection. The different formats live in the corresponding named directories. The datasets in these directories are identical aside from how they are organized.
Multi-label classification
Each line of the csv files indicates an image by its id and a list of corresponding categories present in the frame.
id, categories
4a7f2199-772d-486d-b8e2-b651246316b5, [1.0]
3bddedf6-4ff8-4e81-876a-564d2b03b364, "[1.0, 9.0, 11.0, 88.0]"
3f735021-f5de-4168-b139-74bf2859d12a, "[1.0, 37.0, 51.0, 119.0]"
130e185f-09c5-490c-8d08-641c4cbf6e54, "[1.0, 51.0, 119.0]"
The ids correspond those in the object detection files. The categories are the set of all unique annotations found in the associated image.
Object detection
The datasets are formatted to adhere to the COCO Object Detection standard. Every training image contains at least one annotation corresponding to a category_id ranging from 1 to 290 and a supercategory from 1 to 20. The fine-grained annotations are taxonomic thought not always at the same level of the taxonomic tree.
Supercategories
Each category also belongs to one of 20 semantic supercategories as indicated in category_key.csv:
['Worm', 'Feather star', 'Sea cucumber', 'Squat lobster', 'Fish', 'Soft coral', 'Urchin', 'Sea star', 'Sea fan', 'Sea pen', 'Barnacle', 'Eel', 'Glass sponge', 'Shrimp', 'Black coral', 'Anemone', 'Sea spider', 'Gastropod', 'Crab', 'Stony coral']
These supercategories might be useful for certain training procedures. The supercategories are represented in both the training and validation set. But please note that submissions must be made identifying the 290 fine grained categories.
Files
multilabel_classification/train.csv - csv list of training images and categories
object_detection/train.json - the training images, annotations, and categories in COCO formatted json
object_detection/eval.json - the evaluation images in COCO formatted json
sample_submission.csv - a sample submission file in the correct format
category_key.csv - key mapping numerical index to category and supercategory name
metric.py - a Python implementation of the metric described in the evaluation_details.txt. This is the same metric used to calculate the leaderboard score.