metadata

license: apache-2.0
datasets:
  - isp-uv-es/opensr-test
language:
  - en
pipeline_tag: image-to-image
tags:
  - Sentinel-2
  - sentinel2
  - S2
  - super-resolution

SUPERXI (Draft)

Introduction

Super-resolution (SR) techniques, such as Sentinel-2 and Landsat, are becoming more popular in improving the spatial resolution of freely available satellite imagery. Advocates claim that SR as a preprocessing can significantly improve the accuracy of various remote sensing downstream tasks, including road detection, crop delineation, and object recognition. However, some researchers contend that the benefits of SR are primarily aesthetic, suggesting that its main value lies in creating more visually appealing basemaps or aiding in visual interpretation by non-experts.

Another criticism of SR is that it can degrade the original input data, potentially leading to incorrect conclusions. However, some SR methods appear more conservative than others in preserving reflectance integrity. Given this, a reliable benchmark is essential for providing quantitative assessments of the current state-of-the-art. Without such benchmarks, it remains difficult to conclusively determine the true impact of SR techniques on remote sensing data.

To establish a reliable framework, we propose the creation of a dedicated working group aimed at intercomparing super-resolution algorithms for Sentinel-2 data (SUPERXI). SR algorithms developed by teams from universities, research centers, industries, and space agencies are encouraged to participate in SUPERXI. This initiative will use opensr-test datasets and propose metrics to evaluate the consistency with the original input data and the reliability of the high-frequency details introduced by the SR models.

Datasets

About the high-resolution (HR) reference, we are considering:

naip: A set of 62 orthophotos mainly from agricultural and forest regions in the USA.
spot: A set of 10 SPOT images obtained from Worldstrat.
spain_urban: A set of 20 orthophotos, primarily from urban areas of Spain, including roads.
spain_crops: A set of 20 orthophotos, primarily taken from agricultural areas near cities in Spain.
venus: A set of 60 VENµS images obtained from SEN2VENµS.

Each HR reference includes Sentinel-2 imagery preprocessed at 1C and 2A levels. Here is an example of how to load each dataset.

import opensr_test

dataset = opensr_test.load("naip")
lr, hr = dataset["L2A"], dataset["HRharm"]

Metrics

We propose the following metrics to assess the consistency of SR models:

Reflectance: This metric evaluates how SR affects the reflectance norm of the LR image, utilizing the Mean Absolute Error (MAE) distance by default. Lower values indicate better reflectance consistency. The SR image is downsampled to LR resolution using a triangular anti-aliasing filter and downsampling by a factor of 2.
Spectral: This metric measures how SR impacts the spectral signature of the LR image, employing the Spectral Angle Distance (SAM) by default. Lower values indicate better spectral consistency, with angles measured in degrees. The SR image is downsampled to LR resolution using a triangular anti-aliasing filter and downsampling by a factor of 2.
Spatial: This metric assesses the spatial alignment between SR and LR images, utilizing the Phase Correlation Coefficient (PCC) by default. Some SR models introduce spatial shifts, which this metric detects. The SR image is downsampled to LR resolution using a triangular anti-aliasing filter and downsampling by a factor of 2.

We propose three metrics to evaluate the high-frequency details introduced by SR models. The sum of these metrics always equals 1:

im_score: This metric quantifies the distance between the SR and HR images. A value closer to 1 indicates that the SR model closely corresponds to the HR image.
om_score: This metric measures the distance between the SR and LR images. A value closer to 1 suggests that the SR model closely compares the LR image downsampled with bilinear interpolation.
ha_score: This metric evaluates the distance between SR and the HR and LR images. A value closer to 1 indicates that the SR model deviates significantly from both references.

Experiment

We are planning two experiments for both x4 and x2 scale factors. Participants are encouraged to submit their SR models for both scales. Additionally, models designed solely for the x4 scale will be assessed at the x2 scale by downsampling the SR image by a factor of 2.

In each experiment, we will employ two distinct approaches to evaluate the high-frequency details introduced by SR models. The first approach utilizes the Mean Absolute Error (MAE) as the distance metric for assessing high-frequency details. Alternatively, the second approach employs LPIPS. While MAE is sensitive to the intensity of high-frequency details, LPIPS is more sensitized to their structural nuances. Contrasting the outcomes of these two metrics can offer a comprehensive understanding of the high-frequency details introduced by SR models. LPIPS metrics are consistently run on 32x32 patches of the HR image, while MAE is computed on 2x2 patches for x2 scale and 4x4 patches for x4 scale evaluations.

Teams

TODO

Work plan

Each team will submit their SR models up to the deadline which is set to TODO.
We will have two different types of models: open-source and closed-source. To be considered open-source, the code must be available in this repository within a folder named as the model name. Keep the code as simple as possible. See examples using torch, diffuser, and tensorflow libraries here, here, and here. The closed-source models are required to only provide the results in GeoTIFF format. See an example here.
The submission will be made through a pull request to this repository. The pull request MUST include the metadata.json file and the results in GeoTIFF format. The results must be in the same resolution as the HR image. We expect the following information in the metadata.json file:

{
  "name": "model_name",
  "authors": ["author1", "author2"],
  "affiliations": ["affiliation1", "affiliation2"],
  "description": "A brief description of the model",
  "code": "open-source" or "closed-source",
  "scale": "x2" or "x4",
  "url": "[OPTIONAL] URL to the model repository if it is open-source",
  "license": "license of the model"
}

The SUPERXI working group will evaluate the SR models after the deadline using the metrics discussed above.
After the metric estimation, we will first independently contact the teams providing the results. If there are any issues with the submission, we will ask for clarification, and the team will have up to two weeks to provide the necessary corrections.
Questions and discussions will be held in the discussion section of this repository. **We will not run online meetings. ** We will inform you of the progress of the SUPERXI working group through the discussion section and by email.
After all the participants have provided the necessary corrections, we will publish the results in the discussion section of this repository. Then, we will prepare a dedicated website to present the results and submit a paper to a remote sensing journal.
The paper will be prepared in overleaf, and all the participants will be invited to contribute to it.