File size: 7,382 Bytes
d21b501
15f9621
 
 
 
d21b501
15f9621
d21b501
 
 
 
15f9621
 
 
b3d4471
15f9621
04bcbf8
15f9621
ffb794d
 
 
 
b3d4471
ffb794d
8c62a56
ffb794d
 
15f9621
 
 
 
 
 
 
 
 
 
 
d9864f9
8c62a56
15f9621
 
d9864f9
8c62a56
d9864f9
8c62a56
15f9621
8c62a56
 
 
15f9621
8c62a56
 
 
 
 
15f9621
 
ffb794d
15f9621
305a3e5
 
 
 
 
 
 
 
 
 
 
 
 
 
7244991
305a3e5
7244991
305a3e5
 
15f9621
 
463921b
15f9621
04bcbf8
15f9621
 
 
 
 
 
 
 
 
 
463921b
15f9621
 
 
 
 
8c62a56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
title: >-
  Satellite extractor: Towards a Smart  Eco-epidemiological Model of Dengue in
  Colombia  using Satellite Imagery*.
emoji: 🌍
colorFrom: yellow
colorTo: green
sdk: static
pinned: false
---

# Sentinelhub grant: Sponsoring request ID 1c081a - Towards a Smart Eco-epidemiological Model of Dengue in Colombia using Satellite in Collaboration with MIT Critical Data Colombia.
Project supported by ESA Network of Resources Initiative

Project proposal ([READ MORE](https://eo4society.esa.int/wp-content/uploads/2023/06/towards-a-smart-eco-epidemiological-model-of-dengue-in-colombia-using-satellite.pdf))

**Project Organisation**: MIT Critical Data Colombia.


## Model Description

- **Repository:** [Code](https://github.com/sebasmos/satellite.extractor)
- The proposed dataset format is shared in [Metadengue](https://github.com/sebasmos/MetaDengue)
- **ESA project:** Sponsoring request ID 1c081a - Towards a Smart Eco-epidemiological Model of Dengue in Colombia using Satellite in Collaboration with MIT Critical Data Colombia)
- **Point of Contact:** [Sebastian A. Cajas Ordóñez](mailto:sebasmos@mit.edu)


## Summary 


Here below find all the dataset's versions and descriptions. 


*Baseline method from satellite extractor*: Raw data from Sentinel 2LC1 with recursive artifact removal, clouds removal based on LeastCC and Nearest Interpolation for spatial resolution.

* **SAT1_dataset_5_best_cities**: Top 5 municipalities based on Baseline method from satellite extractor

* **SAT2_dataset_10_best_cities**: Top 10 municipalities based on Baseline method from satellite extractor
  
  *RGB-Version*: [[Link](https://huggingface.co/datasets/MITCriticalData/10_municipalities_RGB)] 

* **SAT3_FULL_COLOMBIA**: Top 81 municipalities based on Baseline method from satellite extractor
  
  *DATASET_81_CITIES_v1.0*: [[link](https://huggingface.co/datasets/MITCriticalData/DATASET_81_CITIES_v1.0)]
  
  *DATASET_81_CITIES_v2.0*: [[link](https://huggingface.co/datasets/MITCriticalData/DATASET_81_CITIES_v2.0)]

* **Creating Cloud-Cloudless Paired Dataset**: This dataset, derived from imagery in five Colombian municipalities, consists of 1640 images (820 pairs), where each of the 164 images per municipality is paired with a previously identified optimal cloudless image. The Cloud2CloudlesDataset class organizes these pairs into a new folder (DATASET), with images renamed to indicate ground truth and cloud presence. The class, initialized with source and destination paths, includes tests for image count verification and folder existence confirmation.

* **Dataset on Rio de Janeiro, 2016-2018**: [[link](https://huggingface.co/MITCriticalData/dataset_rio_de_janeiro_2018_2023)] The datasets DATASET_rio_de_janeiro.zip and DATASET_rio_de_janeiro_forward_backwardv2.zip cover central Rio de Janeiro from 2016 to 2023, each containing 416 images per epidemiological week. DATASET_rio_de_janeiro.zip uses single-forward artifact removal, possibly leading to black images, while DATASET_rio_de_janeiro_forward_backwardv2.zip applies forward-backward artifact removal, replacing black images. Visit: https://github.com/sebasmos/satellite.extractor/tree/main/satellite_extractor/PART_1_satellite-augmentation for details.

* **Landsat Colombia 2008-2016**: [[Link](https://huggingface.co/datasets/MITCriticalData/L7_Dataset_2008_2015)]

* **MODIS 2 2008-2016**: [[link](https://huggingface.co/datasets/MITCriticalData/dataset_modis_2_2008_2015)]

* **SAT4_dataset_10_best_cities_augmented_v1**: Augmented data with aligned metadata. Data was extracted using recursive artifact removal, cloud removal based on LeastCC, and Nearest Interpolation for spatial resolution. Implemented [here](https://github.com/sebasmos/satellite.extractor/blob/main/notebooks/satellite_imagery_augmentation.ipynb) and augmentations applied to RGB channels while leaving other satellite channels unchanged:

* **SAT5_dataset_10_best_cities_augmented_v2**: These images are improved to remove near black images method using a recursive [forward-backward artefact removal algorithm with inter-band data augmentation on satellite imagery](https://github.com/sebasmos/satellite.extractor/tree/main/src/PART_2_satellite-augmentation). Augmented data with aligned metadata. Improved version using Albumentation wrapper modules with extra augmented data. Data extracted using recursive artifact removal, cloud removal based on LeastCC, and Nearest Interpolation for spatial resolution. Implemented [Notebook](https://github.com/sebasmos/satellite.extractor/blob/main/notebooks/PART_2_satellite_imagery_augmentation.ipynb) and augmentations applied to RGB channels while leaving other satellite channels unchanged. 


## Reading data

The data can be read as (example): 

```
from datasets import load_dataset

dataset = load_dataset("MITCriticalData/Unlabeled_top_10_cities_forward_backward_alg")

```

Alternatively, for ```datasets<1.11.0``` the lecture of `.tiff` and  `.json` files is not compatible. In such case we recommend to download the data  as:

``` 
wget path_to_data/images.zip && unzip images.zip -d images/

wget path_to_data/annotations.zip && unzip annotations.zip -d annotations/
```

## Sponsors

* Project supported by ESA Network of Resources Initiative.

Oracle for Research Cloud Credits:  “Towards a Smart Eco-epidemiological Model of Dengue in Colombia using Satellite Images” project by the Oracle for Research Program.

## Licensing Information
The dataset is released under the terms of MIT. By using this, you are also bound to the respective Terms of Use and License of the original source.

##  Author & Mantainer

[Sebastián Cajas Ordóñez](https://sebasmos.github.io/)

## Contributors 

MIT Critical data Colombia Team: Sebastian A. Cajas, David Restrepo,  Kuan-Ting Kuo, Dana Moukheiber, Atika Rahman Paddo, Saptarshi Purkayastha, Leo Anthony Celi,  Po-Chih Kuo, Juan Sebastián Osorio-Valencia, Kuan-Ting Ku, Braiam Escobar, Diego M. López,  Cheng Che Tsai, Wilson Arbey Diaz, Luis Jesús Martínez, Alessa Álvarez, Siyi Tang, Amara Tariq, Imon Banerjee, Aakanksha Rana, Maria Patricia Arbelaez-Montoy, Cheng Che Tsai, Laura Sofía Daza Rosero, Jhon Fredy Romero Núñez, Wilson Arbey Diaz, Luis Jesús Martínez, Saketh Sundar, Alessa Álvarez, Siyi Tang, Amara Tariq, Imon Banerjee, Aakanksha Rana, Ivan Darío Velez, Maria Patricia Arbelaez-Montoya.

## Citation

Please cite our work if you find the resources in this repository useful:

Satellite extractor, [Source code] GitHub. https://github.com/sebasmos/satellite.extractor

```
@article{cajasmulti,
  title={A Multi-Modal Satellite Imagery Dataset for Public Health Analysis in Colombia},
  author={Cajas, Sebastian A and Restrepo, David and Moukheiber, Dana and Kuo, Kuan Ting and Wu, Chenwei and Chicangana, David Santiago Garcia and Paddo, Atika Rahman and Moukheiber, Mira and Moukheiber, Lama and Moukheiber, Sulaiman and others}
}

@article{kuo2024denguenet,
  title={DengueNet: Dengue Prediction using Spatiotemporal Satellite Imagery for Resource-Limited Countries},
  author={Kuo, Kuan-Ting and Moukheiber, Dana and Ordonez, Sebastian Cajas and Restrepo, David and Paddo, Atika Rahman and Chen, Tsung-Yu and Moukheiber, Lama and Moukheiber, Mira and Moukheiber, Sulaiman and Purkayastha, Saptarshi and others},
  journal={arXiv preprint arXiv:2401.11114},
  year={2024}
}
```