Muthukumaran
commited on
Commit
•
6072bd9
1
Parent(s):
4b6024a
impact-comms-updates
Browse filesincorporated updates from the IMPACT O&C team.
README.md
CHANGED
@@ -8,17 +8,17 @@ tags:
|
|
8 |
---
|
9 |
|
10 |
### Model and Inputs
|
11 |
-
Prithvi is a first-of-its-kind temporal Vision
|
12 |
|
13 |
![](GFM.png)
|
14 |
|
15 |
-
The model
|
16 |
-
other works around remote sensing modeling.
|
17 |
|
18 |
### Pre-training
|
19 |
-
The model was pre-trained with NASA's
|
20 |
|
21 |
-
1.
|
22 |
2. Green
|
23 |
3. Red
|
24 |
4. Narrow NIR
|
@@ -26,12 +26,12 @@ The model was pre-trained with NASA's HLS2 L30 product (30m granularity) from th
|
|
26 |
6. SWIR 2
|
27 |
|
28 |
### Code
|
29 |
-
The model follows the [original
|
30 |
|
31 |
-
1.
|
32 |
-
2.
|
33 |
-
3.
|
34 |
-
4.
|
35 |
|
36 |
### Inference and demo
|
37 |
There is an inference script (`Prithvi_run_inference.py`) that allows to run the image reconstruction on a set of three HLS images (see example below). These images have to be geotiff format, including the channels described above (Blue, Green, Red, Narrow NIR, SWIR 1, SWIR 2) in reflectance units. There is also a **demo** that leverages the same code [here](https://huggingface.co/spaces/ibm-nasa-geospatial/Prithvi-100M-demo).
|
@@ -40,7 +40,7 @@ There is an inference script (`Prithvi_run_inference.py`) that allows to run the
|
|
40 |
python Prithvi_run_inference.py --data_files t1.tif t2.tif t3.tif --yaml_file_path /path/to/yaml/Prithvi_100.yaml --checkpoint /path/to/checkpoint/Prithvi_100.pth --output_dir /path/to/out/dir/ --mask_ratio 0.5
|
41 |
```
|
42 |
|
43 |
-
### Finetuning
|
44 |
Examples of finetuning the model for image segmentation using the mmsegmentation library are available through Hugging Face (e.g. [burn scars detection](https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M-burn-scar) and [multi temporal crop classification](https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M-multi-temporal-crop-classification)), with the code used for the experiments available on [github](https://github.com/NASA-IMPACT/hls-foundation-os/tree/main/fine-tuning-examples). This also contains instructions to finetune the model for flood detection on the popular open access [sen1floods11 dataset](https://github.com/cloudtostreet/Sen1Floods11).
|
45 |
|
46 |
## Citation
|
|
|
8 |
---
|
9 |
|
10 |
### Model and Inputs
|
11 |
+
Prithvi is a first-of-its-kind temporal Vision Transformer (ViT) model pre-trained by the IBM and NASA team on contiguous US Harmonised Landsat and Sentinel 2 (HLS) data. The model adopts a self-supervised encoder developed with a ViT architecture and Masked Autoencoder (MAE) learning strategy, with an L1 loss function. The model includes spatial attention across multiple patches and temporal attention for each patch.
|
12 |
|
13 |
![](GFM.png)
|
14 |
|
15 |
+
The model accepts remote sensing data in a video format (B, C, T, H, W). Note that the temporal dimension (T) is very important in this application and not present in most
|
16 |
+
other works around remote sensing modeling. The ability to handle a time series of remote sensing images can benefit a variety of downstream tasks (e.g. Burn Scars segmentation, Flood Segmentation, Land Cover Classification). The model can also handle static imagery which can be fed into the model with T=1.
|
17 |
|
18 |
### Pre-training
|
19 |
+
The model was pre-trained with NASA's HLS V2 L30 product (30m granularity) from the contiguous United States. The following bands were used:
|
20 |
|
21 |
+
1. Blue
|
22 |
2. Green
|
23 |
3. Red
|
24 |
4. Narrow NIR
|
|
|
26 |
6. SWIR 2
|
27 |
|
28 |
### Code
|
29 |
+
The model follows the [original MAE repo](https://github.com/facebookresearch/mae) with the following modifications:
|
30 |
|
31 |
+
1. 2D patch embed replaced with 3D patch embed;
|
32 |
+
2. 2D positional embed replaced with 3D positional embed;
|
33 |
+
3. 2D patchify and unpatchify replaced with 3D.
|
34 |
+
4. added infrared bands besides RGB.
|
35 |
|
36 |
### Inference and demo
|
37 |
There is an inference script (`Prithvi_run_inference.py`) that allows to run the image reconstruction on a set of three HLS images (see example below). These images have to be geotiff format, including the channels described above (Blue, Green, Red, Narrow NIR, SWIR 1, SWIR 2) in reflectance units. There is also a **demo** that leverages the same code [here](https://huggingface.co/spaces/ibm-nasa-geospatial/Prithvi-100M-demo).
|
|
|
40 |
python Prithvi_run_inference.py --data_files t1.tif t2.tif t3.tif --yaml_file_path /path/to/yaml/Prithvi_100.yaml --checkpoint /path/to/checkpoint/Prithvi_100.pth --output_dir /path/to/out/dir/ --mask_ratio 0.5
|
41 |
```
|
42 |
|
43 |
+
### Finetuning Examples
|
44 |
Examples of finetuning the model for image segmentation using the mmsegmentation library are available through Hugging Face (e.g. [burn scars detection](https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M-burn-scar) and [multi temporal crop classification](https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M-multi-temporal-crop-classification)), with the code used for the experiments available on [github](https://github.com/NASA-IMPACT/hls-foundation-os/tree/main/fine-tuning-examples). This also contains instructions to finetune the model for flood detection on the popular open access [sen1floods11 dataset](https://github.com/cloudtostreet/Sen1Floods11).
|
45 |
|
46 |
## Citation
|