---
tags:
- image-classification
- pytorch
metrics:
- accuracy

model-index:
- name: SDO_VT1
  results:
  - task:
      name: Image Classification
      type: image-classification
    metrics:
      - name: Accuracy
        type: accuracy
        value: 0.8695651888847351
---

# NASA Solar Dynamics Observatory Vision Transformer v.1 (SDO_VT1)

## Authors: Frank Soboczenski, PhD (King's College London)

This Vision Transformer model has been fine tuned

first stage

Transformer models have become the goto standard in natural language processing (NLP). Their performance is often unmatched in tasks such as question answering, classification, summarization, and language translation. Recently, the success of their characteristic sequence to sequence architecture and attention mechanism has also been noted in other domains such as computer vision and achieved equal praise on performance on various vision tasks. In contrast to, for example, Convolutional Neural Networks (CNNs), Transformers achieve higher representation power due to their ability to maximize their large receptive field. However, Vision Transformers also come with increased complexity and computational cost which may deter scientists from choosing such a model. We demonstrate the applicability of a Vision Transformer model (SDOVIS) on SDO data in an active region classification task as well as the benefits of utilizing the HuggingFace libraries, data as well as model repositories, and deployment strategies for inference. We aim to highlight the ease of use of the HuggingFace platform, integration with popular deep learning frameworks such as PyTorch, TensorFlow, or JAX, performance monitoring with Weights and Biases, and the ability to effortlessly utilize pre-trained large scale Transformer models for targeted fine-tuning purposes.


The authors gratefully acknowledge the entire NASA Solar Dynamics Observatory Team. 
Additionally, the data used was provided courtesy of NASA/SDO and the AIA, EVE, and HMI science teams.

## Example Images
Drag one of the images below into the inference API field on the upper right.

Additional images for testing can be found at: 
[Solar Dynamics Observatory Gallery](https://sdo.gsfc.nasa.gov/gallery/main/search)
You can use the following tags to further select images for testing: 
"coronal holes", "loops" or "flares"
You can also choose "active regions" to get a general pool for testing.

### NASA_SDO_Coronal_Hole

![NASA_SDO_Coronal_Hole](images/NASA_SDO_Coronal_Hole2.jpg)

### NASA_SDO_Coronal_Loop

![NASA_SDO_Coronal_Loop](images/NASA_SDO_Coronal_Loop.jpg)

### NASA_SDO_Solar_Flare

![NASA_SDO_Solar_Flare](images/NASA_SDO_Solar_Flare.jpg)

## Training data
The ViT model was pretrained on a dataset consisting of 14 million images and 21k classes ([ImageNet-21k](http://www.image-net.org/).
More information on the base model used can be found here: (https://huggingface.co/google/vit-base-patch16-224-in21k)

## How to use this Model

## References

A publication on this work is currently in preparation. In the meantime, please refer to this model by using the following citation: 
```
@misc{sdovt2022,
  author = {Frank Soboczenski and Paul J Wright},
  title = {SDOVT: A Vision Transformer Model for Solar Dynamics Observatory (SDO) Data},
  url = {http://github.com/h21k/sdovis},
  version = {1.0},
  year = {2022},
}
```

For the base ViT model used please refer to:

```bibtex
@misc{wu2020visual,
      title={Visual Transformers: Token-based Image Representation and Processing for Computer Vision}, 
      author={Bichen Wu and Chenfeng Xu and Xiaoliang Dai and Alvin Wan and Peizhao Zhang and Zhicheng Yan and Masayoshi Tomizuka and Joseph Gonzalez and Kurt Keutzer and Peter Vajda},
      year={2020},
      eprint={2006.03677},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```