---
license: cc-by-4.0
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->


- **Developed by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]

---
Table 1: Linear probing results on six classification tasks. All models are trained
for 50 epochs. The reported numbers are top-1 overall accuracy (OA). Missing values
are due to the inability of the model to adapt to this domain.

| Method             | Backbone    | m-bigearthnet | m-forestnet | m-brick-kiln | m-pv4ger | m-so2sat | m-eurosat |
|--------------------|-------------|---------------|-------------|--------------|----------|----------|-----------|
| **Fully Trained**  | ViT-S       | 66.0          | 53.8        | 98.1         | 97.6     | 57.5     | 97.3      |
| **Fully Trained**  | SwinV2-T    | 70.0          | 58.0        | 98.7         | 98.0     | 56.1     | 97.4      |
| **Fully Trained**  | ConvNext-B  | 69.1          | 56.8        | 98.9         | 98.0     | 58.1     | 97.7      |
| **rand. init.**    | ViT-B       | 52.9          | 41.5        | 84.5         | 91.3     | 38.3     | 85.7      |
| **MAE_Single [44]**| ViT-B       | 63.6          | -           | 88.9         | 92.2     | 50.0     | 88.9      |
| **OFA-Net [43]**   | ViT-B       | 65.0          | -           | 94.7         | 93.2     | 49.4     | 91.9      |
| **SatMAE [25]**    | ViT-B       | 62.1          | -           | 93.9         | -        | 46.9     | 86.4      |
| **Scale-MAE [22]** | ViT-L       | -             | -           | -            | 96.9     | -        | -         |
| **GFM [21]**       | Swin-B      | -             | -           | -            | 96.8     | -        | -         |
| **Cross-Scale MAE [23]** | ViT-B | -             | -           | -            | 93.1     | -        | -         |
| **FG-MAE [24]**    | ViT-B       | 63.0          | -           | 94.7         | -        | 51.4     | 87.0      |
| **CROMA [27]**     | ViT-B       | 67.4          | -           | 91.0         | -        | 49.2     | 90.1      |
| **DOFA**           | ViT-B       | 65.7          | 50.9        | 95.8         | 96.9     | 55.1     | 93.9      |
| **DOFA**           | ViT-L       | 67.5          | 54.6        | 96.9         | 97.3     | 60.1     | 97.1      |


Table 2: Partial fine-tuning results on six segmentation tasks. All models are
trained with a frozen backbone for 20 epochs. Reported numbers are mean intersection
over union (mIoU). Missing values are due to the inability of the model to adapt to
this domain.

| Method             | Backbone    | m-pv4ger-seg | m-nz-cattle | m-NeonTree | m-cashew-plant | m-SA-crop | m-chesapeake |
|--------------------|-------------|--------------|-------------|------------|----------------|-----------|--------------|
| **DeepLabv3**      | ResNet101   | 93.4         | 67.6        | 53.9       | 48.6           | 30.4      | 62.1         |
| **U-Net**          | ResNet101   | 94.1         | 80.5        | 56.6       | 46.6           | 29.9      | 70.8         |
| **rand. init.**    | ViT-B       | 81.7         | 74.1        | 51.7       | 32.4           | 29.0      | 47.1         |
| **MAE_Single [44]**| ViT-B       | 88.4         | 76.4        | 53.0       | 40.7           | 30.7      | 51.9         |
| **OFA-Net [43]**   | ViT-B       | 89.4         | 77.6        | 53.3       | 47.9           | 31.9      | 54.5         |
| **Scale-MAE [22]** | ViT-L       | 83.5         | 76.5        | 51.0       | -              | -         | 61.0         |
| **GFM [21]**       | Swin-B      | 92.0         | 75.0        | 51.1       | -              | -         | 63.8         |
| **Cross-Scale MAE [23]** | ViT-B | 83.2         | 77.9        | 52.1       | -              | -         | 52.3         |
| **CROMA [27]**     | ViT-B       | -            | -           | -          | 30.1           | 31.4      | -            |
| **FG-MAE [24]**    | ViT-B       | -            | -           | -          | 40.8           | 30.6      | -            |
| **DOFA**           | ViT-B       | 94.5         | 81.4        | 58.8       | 51.5           | **33.0**  | 65.3         |
| **DOFA**           | ViT-L       | 95.0         | 81.8        | 59.4       | **56.9**       | **32.1**  | 66.3         |

---

## Uses