ecmwf
/

aifs-single

Graph Machine Learning

AnemoI

English

Model card Files Files and versions Community

jpxkqx commited on Oct 25, 2024

Commit

e670aa6

•

1 Parent(s): b6d9f46

Update README.md

Browse files

Files changed (1) hide show

README.md +24 -12

README.md CHANGED Viewed

@@ -41,10 +41,11 @@ and direct observational data.
 - **License:** CC BY-SA 4.0
-### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
 - **Repository:** [Anemoi](https://anemoi-docs.readthedocs.io/en/latest/index.html)
 - **Paper:** https://arxiv.org/pdf/2406.01465
@@ -103,30 +104,43 @@ The full list of input and output fields is shown below:
 | Total precipitation, convective precipitation                                                                                                               | Surface                                                                      | Output       |
 | Land-sea mask, orography, standard deviation of sub-grid orography, slope of sub-scale orography, insolation, latitude/longitude, time of day/day of year   | Surface                                                                      | Input        |
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-- Pre-training was performed on ERA5 for the years 1979 to 2020 with a cosine learning rate (LR) schedule and a total
 of 260,000 steps. The LR is increased from 0 to \\(10^{-4}\\) during the first 1000 steps, then it is annealed to a minimum
 of \\(3 × 10^{-7}\\).
-- The pre-training is then followed by rollout on ERA5 for the years 1979 to 2018, this time with a LR
 of \\(6 × 10^{-7}\\). As in [Lam et al. [2023]](https://doi.org/10.48550/arXiv.2212.12794) we increase the
 rollout every 1000 training steps up to a maximum of 72 h (12 auto-regressive steps).
-- Finally, to further improve forecast performance, we fine-tune the model on operational real-time IFS NWP
 analyses. This is done via another round of rollout training, this time using IFS operational analysis data
 from 2019 and 2020
 #### Training Hyperparameters
-- **Training regime:** {{ training_regime | default("[More Information Needed]", true)}} <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-{{ speeds_sizes_times | default("[More Information Needed]", true)}}
 ## Evaluation
@@ -192,6 +206,9 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 {{ hardware_requirements | default("[More Information Needed]", true)}}
 #### Software
 {{ software | default("[More Information Needed]", true)}}
@@ -218,12 +235,7 @@ If you use this model in your work, please cite it as follows:
 ```
 Lang, S., Alexe, M., Chantry, M., Dramsch, J., Pinault, F., Raoult, B., ... & Rabier, F. (2024). AIFS-ECMWF's data-driven forecasting system. arXiv preprint arXiv:2406.01465.
 ```
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-{{ glossary | default("[More Information Needed]", true)}}
 ## More Information

 - **License:** CC BY-SA 4.0
+### Model Sources
 <!-- Provide the basic links for the model. -->
 - **Repository:** [Anemoi](https://anemoi-docs.readthedocs.io/en/latest/index.html)
 - **Paper:** https://arxiv.org/pdf/2406.01465
 | Total precipitation, convective precipitation                                                                                                               | Surface                                                                      | Output       |
 | Land-sea mask, orography, standard deviation of sub-grid orography, slope of sub-scale orography, insolation, latitude/longitude, time of day/day of year   | Surface                                                                      | Input        |
+Input and output states are normalised to unit variance and zero mean for each level. Some of
+the forcing variables, like orography, are min-max normalised.
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+- **Pre-training**: It was performed on ERA5 for the years 1979 to 2020 with a cosine learning rate (LR) schedule and a total
 of 260,000 steps. The LR is increased from 0 to \\(10^{-4}\\) during the first 1000 steps, then it is annealed to a minimum
 of \\(3 × 10^{-7}\\).
+- **Fine-tuning I**: The pre-training is then followed by rollout on ERA5 for the years 1979 to 2018, this time with a LR
 of \\(6 × 10^{-7}\\). As in [Lam et al. [2023]](https://doi.org/10.48550/arXiv.2212.12794) we increase the
 rollout every 1000 training steps up to a maximum of 72 h (12 auto-regressive steps).
+- **Fine-tuning II**: Finally, to further improve forecast performance, we fine-tune the model on operational real-time IFS NWP
 analyses. This is done via another round of rollout training, this time using IFS operational analysis data
 from 2019 and 2020
 #### Training Hyperparameters
+- **Optimizer:** We use *AdamW* (Loshchilov and Hutter [2019]) with the \\(β\\)-coefficients set to 0.9 and 0.95.
+- **Loss function:** The loss function is an area-weighted mean squared error (MSE) between the target atmospheric state
+and prediction.
+- **Loss scaling:** A loss scaling is applied for each output variable. The scaling was chosen empirically such that
+all prognostic variables have roughly equal contributions to the loss, with the exception of the vertical velocities,
+for which the weight was reduced. The loss weights also decrease linearly with height, which means that levels in
+the upper atmosphere (e.g., 50 hPa) contribute relatively little to the total loss value.
+#### Speeds, Sizes, Times
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+Data parallelism is used for training, with a batch size of 16. One model instance is split across four 40GB A100
+GPUs within one node. Training is done using mixed precision (Micikevicius et al. [2018]), and the entire process
+takes about one week, with 64 GPUs in total.
 ## Evaluation
 {{ hardware_requirements | default("[More Information Needed]", true)}}
+We acknowledge PRACE for awarding us access to Leonardo, CINECA, Italy
 #### Software
 {{ software | default("[More Information Needed]", true)}}
 ```
 Lang, S., Alexe, M., Chantry, M., Dramsch, J., Pinault, F., Raoult, B., ... & Rabier, F. (2024). AIFS-ECMWF's data-driven forecasting system. arXiv preprint arXiv:2406.01465.
 ```
 ## More Information