Graph Machine Learning
AnemoI
English
jpxkqx commited on
Commit
e670aa6
1 Parent(s): b6d9f46

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -12
README.md CHANGED
@@ -41,10 +41,11 @@ and direct observational data.
41
  - **License:** CC BY-SA 4.0
42
 
43
 
44
- ### Model Sources [optional]
45
 
46
  <!-- Provide the basic links for the model. -->
47
 
 
48
  - **Repository:** [Anemoi](https://anemoi-docs.readthedocs.io/en/latest/index.html)
49
  - **Paper:** https://arxiv.org/pdf/2406.01465
50
 
@@ -103,30 +104,43 @@ The full list of input and output fields is shown below:
103
  | Total precipitation, convective precipitation | Surface | Output |
104
  | Land-sea mask, orography, standard deviation of sub-grid orography, slope of sub-scale orography, insolation, latitude/longitude, time of day/day of year | Surface | Input |
105
 
 
 
 
106
  ### Training Procedure
107
 
108
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
109
 
110
- - Pre-training was performed on ERA5 for the years 1979 to 2020 with a cosine learning rate (LR) schedule and a total
111
  of 260,000 steps. The LR is increased from 0 to \\(10^{-4}\\) during the first 1000 steps, then it is annealed to a minimum
112
  of \\(3 × 10^{-7}\\).
113
- - The pre-training is then followed by rollout on ERA5 for the years 1979 to 2018, this time with a LR
114
  of \\(6 × 10^{-7}\\). As in [Lam et al. [2023]](https://doi.org/10.48550/arXiv.2212.12794) we increase the
115
  rollout every 1000 training steps up to a maximum of 72 h (12 auto-regressive steps).
116
- - Finally, to further improve forecast performance, we fine-tune the model on operational real-time IFS NWP
117
  analyses. This is done via another round of rollout training, this time using IFS operational analysis data
118
  from 2019 and 2020
119
 
120
 
121
  #### Training Hyperparameters
122
 
123
- - **Training regime:** {{ training_regime | default("[More Information Needed]", true)}} <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
 
 
 
 
 
124
 
125
- #### Speeds, Sizes, Times [optional]
126
 
127
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
128
 
129
- {{ speeds_sizes_times | default("[More Information Needed]", true)}}
 
 
130
 
131
  ## Evaluation
132
 
@@ -192,6 +206,9 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
192
 
193
  {{ hardware_requirements | default("[More Information Needed]", true)}}
194
 
 
 
 
195
  #### Software
196
 
197
  {{ software | default("[More Information Needed]", true)}}
@@ -218,12 +235,7 @@ If you use this model in your work, please cite it as follows:
218
  ```
219
  Lang, S., Alexe, M., Chantry, M., Dramsch, J., Pinault, F., Raoult, B., ... & Rabier, F. (2024). AIFS-ECMWF's data-driven forecasting system. arXiv preprint arXiv:2406.01465.
220
  ```
221
-
222
- ## Glossary [optional]
223
-
224
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
225
 
226
- {{ glossary | default("[More Information Needed]", true)}}
227
 
228
  ## More Information
229
 
 
41
  - **License:** CC BY-SA 4.0
42
 
43
 
44
+ ### Model Sources
45
 
46
  <!-- Provide the basic links for the model. -->
47
 
48
+
49
  - **Repository:** [Anemoi](https://anemoi-docs.readthedocs.io/en/latest/index.html)
50
  - **Paper:** https://arxiv.org/pdf/2406.01465
51
 
 
104
  | Total precipitation, convective precipitation | Surface | Output |
105
  | Land-sea mask, orography, standard deviation of sub-grid orography, slope of sub-scale orography, insolation, latitude/longitude, time of day/day of year | Surface | Input |
106
 
107
+ Input and output states are normalised to unit variance and zero mean for each level. Some of
108
+ the forcing variables, like orography, are min-max normalised.
109
+
110
  ### Training Procedure
111
 
112
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
113
 
114
+ - **Pre-training**: It was performed on ERA5 for the years 1979 to 2020 with a cosine learning rate (LR) schedule and a total
115
  of 260,000 steps. The LR is increased from 0 to \\(10^{-4}\\) during the first 1000 steps, then it is annealed to a minimum
116
  of \\(3 × 10^{-7}\\).
117
+ - **Fine-tuning I**: The pre-training is then followed by rollout on ERA5 for the years 1979 to 2018, this time with a LR
118
  of \\(6 × 10^{-7}\\). As in [Lam et al. [2023]](https://doi.org/10.48550/arXiv.2212.12794) we increase the
119
  rollout every 1000 training steps up to a maximum of 72 h (12 auto-regressive steps).
120
+ - **Fine-tuning II**: Finally, to further improve forecast performance, we fine-tune the model on operational real-time IFS NWP
121
  analyses. This is done via another round of rollout training, this time using IFS operational analysis data
122
  from 2019 and 2020
123
 
124
 
125
  #### Training Hyperparameters
126
 
127
+ - **Optimizer:** We use *AdamW* (Loshchilov and Hutter [2019]) with the \\(β\\)-coefficients set to 0.9 and 0.95.
128
+
129
+ - **Loss function:** The loss function is an area-weighted mean squared error (MSE) between the target atmospheric state
130
+ and prediction.
131
+
132
+ - **Loss scaling:** A loss scaling is applied for each output variable. The scaling was chosen empirically such that
133
+ all prognostic variables have roughly equal contributions to the loss, with the exception of the vertical velocities,
134
+ for which the weight was reduced. The loss weights also decrease linearly with height, which means that levels in
135
+ the upper atmosphere (e.g., 50 hPa) contribute relatively little to the total loss value.
136
 
137
+ #### Speeds, Sizes, Times
138
 
139
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
140
 
141
+ Data parallelism is used for training, with a batch size of 16. One model instance is split across four 40GB A100
142
+ GPUs within one node. Training is done using mixed precision (Micikevicius et al. [2018]), and the entire process
143
+ takes about one week, with 64 GPUs in total.
144
 
145
  ## Evaluation
146
 
 
206
 
207
  {{ hardware_requirements | default("[More Information Needed]", true)}}
208
 
209
+ We acknowledge PRACE for awarding us access to Leonardo, CINECA, Italy
210
+
211
+
212
  #### Software
213
 
214
  {{ software | default("[More Information Needed]", true)}}
 
235
  ```
236
  Lang, S., Alexe, M., Chantry, M., Dramsch, J., Pinault, F., Raoult, B., ... & Rabier, F. (2024). AIFS-ECMWF's data-driven forecasting system. arXiv preprint arXiv:2406.01465.
237
  ```
 
 
 
 
238
 
 
239
 
240
  ## More Information
241