README.md · ibm-nasa-geospatial/Prithvi-100M at e4b4753364ecaf16e46260d1c7bef6bb848d4f0e

metadata

license: apache-2.0
tags:
  - Pytorch
  - Geospatial
  - Temporal ViT
  - Vit

Model and Inputs

Prithvi is a first-of-its-kind temporal Vision transformer pre-trained by the IBM and NASA team on contiguous US Harmonised Landsat Sentinel 2 (HLS) data. Particularly, the model adopts a self-supervised encoder developed with a ViT architecture and Masked AutoEncoder learning strategy with an L1 loss function. The model includes spatial attention across multiple patches and also temporal attention for each patch.

The model expects remote sensing data in a video format (B, C, T, H, W). Note that the temporal dimension is very important here and not present in most other works around remote sensing modeling. Being able to handle a time series of remote sensing images can benefit a variety of downstream tasks. The model can also handle static images, which can be simply fed into the model with T=1.

Pre-training

The model was pre-trained with NASA's HLS2 L30 product (30m granularity) from the contiguous United States. The bands that were used are the following:

Blue
Green
Red
Narrow NIR
SWIR 1
SWIR 2

Code

The model follows the original mae repo with some modifications including:

replace 2D patch embed with 3D patch embed;
replace 2D positional embed with 3D positional embed;
replace 2D patchify and unpatchify with 3D.
adding infrared bands besides RGB

Inference and demo

There is an inference script (Prithvi_run_inference.py) that allows to run the image reconstruction on a set of three HLS images (see example below). These images have to be geotiff format, including the channels described above (Blue, Green, Red, Narrow NIR, SWIR 1, SWIR 2) in reflectance units. There is also a demo that leverages the same code here.

python Prithvi_run_inference.py --data_files t1.tif t2.tif t3.tif --yaml_file_path /path/to/yaml/Prithvi_100.yaml --checkpoint /path/to/checkpoint/Prithvi_100.pth --output_dir /path/to/out/dir/ --mask_ratio 0.5

Finetuning examples

Examples of finetuning the model for image segmentation using the mmsegmentation library are available through Hugging Face (e.g. burn scars detection and multi temporal crop classification), with the code used for the experiments available on github. This also contains instructions to finetune the model for flood detection on the popular open access sen1floods11 dataset.

Citation

If this model helped your research, please cite Prithvi-100M in your publications. Here is an example BibTeX entry:

@misc{Prithvi-100M,
    author = {Roy, Sujit and Ankur, Kumar and Phillips, Christopher and Ramasubramanian, Muthukumaran and Gurung, Iksha and Ramachandran, Iksha and Maskey, Manil and Olofossen, Pontus and Lee, Elizabeth and Murphy, Kevin and Duffy, Dan and Little, Mike and Jakubik, Johannes and Chu, Linsong and Fraccaro, Paolo and Das, Ranjini,Kamal and Kimura, Daiki and Simumba, Naomi and Szwarcman, Daniela and Michal, Michal and Weldemariam, Kommy and Zadrozny, Bianca and Ganti, Raghu and Costa, Carlos and Alemohammad, Hamed and Cecil, Michael and Li, Steve and Khallaghi, Sam and Godwin, Denys and Ahmadi, Maryam and Kordi, Fatemeh and Saux, Bertrand and Pastick, Neal and Doucette, Peter and Fleckenstein, Rylie and Luanga, Dalton and Corvin, Alex and Granger, Erwan},
    doi    = {https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M},
    month  = aug,
    title  = {{Prithvi-100M}},
    url    = {https://github.com/nasa-impact/Prithvi-100M},
    year   = {2023}
}