AI & ML interests

Earth Observation Datasets

Recent Activity

mkluczekΒ  updated a dataset 10 days ago
Major-TOM/Core-S1RTC-SSL4EO
mkluczekΒ  updated a dataset 10 days ago
Major-TOM/Core-S2RGB-SigLIP
mkluczekΒ  updated a dataset 10 days ago
Major-TOM/Core-S2RGB-DINOv2
View all activity

image/png

Major TOM: Expandable Datasets for Earth Observation

GitHub stars

A standard for curating, sharing and combining large-scale EO datasets.

This organisation provides a platform for making further contributions to the project with the aim of building large, open and compatible Earth observation datasets.


πŸ’¬ Discord

Dedicated channels for Major TOM coordination on satellite-image-deep-learning Discord server: https://discord.gg/SsUJCDcrcu

🎞 Official Image Datasets

Dataset Modality Number of Patches Sensing Type Comments
Core-S2L2A Sentinel-2 Level 2A 2,245,886 Multi-Spectral General-Purpose Global (about 23 TB)
Core-S2L1C Sentinel-2 Level 1C 2,245,886 Multi-Spectral General-Purpose Global (about 23 TB)
Core-S1RTC Sentinel-1 RTC 1,469,955 SAR General-Purpose Global (about 16 TB)
Core-DEM Copernicus DEM 30 1,837,843 Digital Surface Model (DSM) General-Purpose Global (about 1 TB)

πŸ“Š Official Embedding Datasets

Dataset Modality Number of Embeddings Sensing Type Source Dataset Source Model Size
Core-S2L1C-SSL4EO Sentinel-2 Level 1C 56,147,150 Multi-Spectral Core-S2L1C SSL4EO-ResNet50-DINO 252.9 GB
Core-S1RTC-SSL4EO Sentinel-1 RTC 36,748,875 SAR Core-S1RTC SSL4EO-ResNet50-MOCO 332.5 GB
Core-S2RGB-DINOv2 Sentinel-2 Level 2A (RGB) 56,147,150 True Colour Core-S2L2A DINOv2 223.1 GB
Core-S2RGB-SigLIP Sentinel-2 Level 2A (RGB) 20,212,974 True Colour Core-S2L2A SigLIP-SO400M-384 41.3 GB

πŸ“Œ Open Access Manuscript

arxiv This project has been outlined in https://arxiv.org/abs/2402.12095/.

Read Abstract

Deep learning models are increasingly data-hungry, requiring significant resources to collect and compile the datasets needed to train them, with Earth Observation (EO) models being no exception. However, the landscape of datasets in EO is relatively atomised, with interoperability made difficult by diverse formats and data structures. If ever larger datasets are to be built, and duplication of effort minimised, then a shared framework that allows users to combine and access multiple datasets is needed. Here, Major TOM (Terrestrial Observation Metaset) is proposed as this extensible framework. Primarily, it consists of a geographical indexing system based on a set of grid points and a metadata structure that allows multiple datasets with different sources to be merged. Besides the specification of Major TOM as a framework, this work also presents a large, open-access dataset, MajorTOM-Core, which covers the vast majority of the Earth's land surface. This dataset provides the community with both an immediately useful resource, as well as acting as a template for future additions to the Major TOM ecosystem.

image/jpeg

If you found this useful for your research, please cite accordingly as:

@inproceedings{Major_TOM_2024,
  author={Francis, Alistair and Czerkawski, Mikolaj},
  booktitle={IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium}, 
  title={Major TOM: Expandable Datasets for Earth Observation}, 
  year={2024},
  pages={2935-2940},
  doi={10.1109/IGARSS53475.2024.10640760}
}

Powered by Ξ¦-lab, European Space Agency (ESA) πŸ›°οΈ


FAQ

Is Major TOM just another EO dataset?

Almost. Major TOM is not a dataset, but a project aiming to standardize some of the future EO datasets. As an example of what such a dataset could be like, MajorTOM-Core is released as a nearly global dataset of Sentinel-2 data.

Scroll up to the 🎞 Official Image Datasets section of this file to see the list of current datasets.

Who is going to contribute to upcoming Major TOM datasets?

Anyone can contribute. The original authors of the Major TOM paper are already working on a few other datasets that will join the Major TOM initiative.

Can I join Major TOM organisation on HuggingFace?

Anyone can join the organisation with reading rights. In order to gain contributor rights, you will need to contact one of the admins and verify who you are and how you would like to contribute (you should be allowed to contribute with any dataset that follows Major TOM standard).

models

None public yet