--- license: cc-by-4.0 --- # Model Card for Model ID This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). ## Model Details ### Model Description - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] --- Table 1: Linear probing results on six classification tasks. All models are trained for 50 epochs. The reported numbers are top-1 overall accuracy (OA). Missing values are due to the inability of the model to adapt to this domain. | Method | Backbone | m-bigearthnet | m-forestnet | m-brick-kiln | m-pv4ger | m-so2sat | m-eurosat | |--------------------|-------------|---------------|-------------|--------------|----------|----------|-----------| | **Fully Trained** | ViT-S | 66.0 | 53.8 | 98.1 | 97.6 | 57.5 | 97.3 | | **Fully Trained** | SwinV2-T | 70.0 | 58.0 | 98.7 | 98.0 | 56.1 | 97.4 | | **Fully Trained** | ConvNext-B | 69.1 | 56.8 | 98.9 | 98.0 | 58.1 | 97.7 | | **rand. init.** | ViT-B | 52.9 | 41.5 | 84.5 | 91.3 | 38.3 | 85.7 | | **MAE_Single [44]**| ViT-B | 63.6 | - | 88.9 | 92.2 | 50.0 | 88.9 | | **OFA-Net [43]** | ViT-B | 65.0 | - | 94.7 | 93.2 | 49.4 | 91.9 | | **SatMAE [25]** | ViT-B | 62.1 | - | 93.9 | - | 46.9 | 86.4 | | **Scale-MAE [22]** | ViT-L | - | - | - | 96.9 | - | - | | **GFM [21]** | Swin-B | - | - | - | 96.8 | - | - | | **Cross-Scale MAE [23]** | ViT-B | - | - | - | 93.1 | - | - | | **FG-MAE [24]** | ViT-B | 63.0 | - | 94.7 | - | 51.4 | 87.0 | | **CROMA [27]** | ViT-B | 67.4 | - | 91.0 | - | 49.2 | 90.1 | | **DOFA** | ViT-B | 65.7 | 50.9 | 95.8 | 96.9 | 55.1 | 93.9 | | **DOFA** | ViT-L | 67.5 | 54.6 | 96.9 | 97.3 | 60.1 | 97.1 | Table 2: Partial fine-tuning results on six segmentation tasks. All models are trained with a frozen backbone for 20 epochs. Reported numbers are mean intersection over union (mIoU). Missing values are due to the inability of the model to adapt to this domain. | Method | Backbone | m-pv4ger-seg | m-nz-cattle | m-NeonTree | m-cashew-plant | m-SA-crop | m-chesapeake | |--------------------|-------------|--------------|-------------|------------|----------------|-----------|--------------| | **DeepLabv3** | ResNet101 | 93.4 | 67.6 | 53.9 | 48.6 | 30.4 | 62.1 | | **U-Net** | ResNet101 | 94.1 | 80.5 | 56.6 | 46.6 | 29.9 | 70.8 | | **rand. init.** | ViT-B | 81.7 | 74.1 | 51.7 | 32.4 | 29.0 | 47.1 | | **MAE_Single [44]**| ViT-B | 88.4 | 76.4 | 53.0 | 40.7 | 30.7 | 51.9 | | **OFA-Net [43]** | ViT-B | 89.4 | 77.6 | 53.3 | 47.9 | 31.9 | 54.5 | | **Scale-MAE [22]** | ViT-L | 83.5 | 76.5 | 51.0 | - | - | 61.0 | | **GFM [21]** | Swin-B | 92.0 | 75.0 | 51.1 | - | - | 63.8 | | **Cross-Scale MAE [23]** | ViT-B | 83.2 | 77.9 | 52.1 | - | - | 52.3 | | **CROMA [27]** | ViT-B | - | - | - | 30.1 | 31.4 | - | | **FG-MAE [24]** | ViT-B | - | - | - | 40.8 | 30.6 | - | | **DOFA** | ViT-B | 94.5 | 81.4 | 58.8 | 51.5 | **33.0** | 65.3 | | **DOFA** | ViT-L | 95.0 | 81.8 | 59.4 | **56.9** | **32.1** | 66.3 | --- ## Uses