Update README.md
Browse files
README.md
CHANGED
@@ -41,10 +41,11 @@ and direct observational data.
|
|
41 |
- **License:** CC BY-SA 4.0
|
42 |
|
43 |
|
44 |
-
### Model Sources
|
45 |
|
46 |
<!-- Provide the basic links for the model. -->
|
47 |
|
|
|
48 |
- **Repository:** [Anemoi](https://anemoi-docs.readthedocs.io/en/latest/index.html)
|
49 |
- **Paper:** https://arxiv.org/pdf/2406.01465
|
50 |
|
@@ -103,30 +104,43 @@ The full list of input and output fields is shown below:
|
|
103 |
| Total precipitation, convective precipitation | Surface | Output |
|
104 |
| Land-sea mask, orography, standard deviation of sub-grid orography, slope of sub-scale orography, insolation, latitude/longitude, time of day/day of year | Surface | Input |
|
105 |
|
|
|
|
|
|
|
106 |
### Training Procedure
|
107 |
|
108 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
109 |
|
110 |
-
- Pre-training was performed on ERA5 for the years 1979 to 2020 with a cosine learning rate (LR) schedule and a total
|
111 |
of 260,000 steps. The LR is increased from 0 to \\(10^{-4}\\) during the first 1000 steps, then it is annealed to a minimum
|
112 |
of \\(3 × 10^{-7}\\).
|
113 |
-
- The pre-training is then followed by rollout on ERA5 for the years 1979 to 2018, this time with a LR
|
114 |
of \\(6 × 10^{-7}\\). As in [Lam et al. [2023]](https://doi.org/10.48550/arXiv.2212.12794) we increase the
|
115 |
rollout every 1000 training steps up to a maximum of 72 h (12 auto-regressive steps).
|
116 |
-
- Finally, to further improve forecast performance, we fine-tune the model on operational real-time IFS NWP
|
117 |
analyses. This is done via another round of rollout training, this time using IFS operational analysis data
|
118 |
from 2019 and 2020
|
119 |
|
120 |
|
121 |
#### Training Hyperparameters
|
122 |
|
123 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
124 |
|
125 |
-
#### Speeds, Sizes, Times
|
126 |
|
127 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
128 |
|
129 |
-
|
|
|
|
|
130 |
|
131 |
## Evaluation
|
132 |
|
@@ -192,6 +206,9 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
192 |
|
193 |
{{ hardware_requirements | default("[More Information Needed]", true)}}
|
194 |
|
|
|
|
|
|
|
195 |
#### Software
|
196 |
|
197 |
{{ software | default("[More Information Needed]", true)}}
|
@@ -218,12 +235,7 @@ If you use this model in your work, please cite it as follows:
|
|
218 |
```
|
219 |
Lang, S., Alexe, M., Chantry, M., Dramsch, J., Pinault, F., Raoult, B., ... & Rabier, F. (2024). AIFS-ECMWF's data-driven forecasting system. arXiv preprint arXiv:2406.01465.
|
220 |
```
|
221 |
-
|
222 |
-
## Glossary [optional]
|
223 |
-
|
224 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
225 |
|
226 |
-
{{ glossary | default("[More Information Needed]", true)}}
|
227 |
|
228 |
## More Information
|
229 |
|
|
|
41 |
- **License:** CC BY-SA 4.0
|
42 |
|
43 |
|
44 |
+
### Model Sources
|
45 |
|
46 |
<!-- Provide the basic links for the model. -->
|
47 |
|
48 |
+
|
49 |
- **Repository:** [Anemoi](https://anemoi-docs.readthedocs.io/en/latest/index.html)
|
50 |
- **Paper:** https://arxiv.org/pdf/2406.01465
|
51 |
|
|
|
104 |
| Total precipitation, convective precipitation | Surface | Output |
|
105 |
| Land-sea mask, orography, standard deviation of sub-grid orography, slope of sub-scale orography, insolation, latitude/longitude, time of day/day of year | Surface | Input |
|
106 |
|
107 |
+
Input and output states are normalised to unit variance and zero mean for each level. Some of
|
108 |
+
the forcing variables, like orography, are min-max normalised.
|
109 |
+
|
110 |
### Training Procedure
|
111 |
|
112 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
113 |
|
114 |
+
- **Pre-training**: It was performed on ERA5 for the years 1979 to 2020 with a cosine learning rate (LR) schedule and a total
|
115 |
of 260,000 steps. The LR is increased from 0 to \\(10^{-4}\\) during the first 1000 steps, then it is annealed to a minimum
|
116 |
of \\(3 × 10^{-7}\\).
|
117 |
+
- **Fine-tuning I**: The pre-training is then followed by rollout on ERA5 for the years 1979 to 2018, this time with a LR
|
118 |
of \\(6 × 10^{-7}\\). As in [Lam et al. [2023]](https://doi.org/10.48550/arXiv.2212.12794) we increase the
|
119 |
rollout every 1000 training steps up to a maximum of 72 h (12 auto-regressive steps).
|
120 |
+
- **Fine-tuning II**: Finally, to further improve forecast performance, we fine-tune the model on operational real-time IFS NWP
|
121 |
analyses. This is done via another round of rollout training, this time using IFS operational analysis data
|
122 |
from 2019 and 2020
|
123 |
|
124 |
|
125 |
#### Training Hyperparameters
|
126 |
|
127 |
+
- **Optimizer:** We use *AdamW* (Loshchilov and Hutter [2019]) with the \\(β\\)-coefficients set to 0.9 and 0.95.
|
128 |
+
|
129 |
+
- **Loss function:** The loss function is an area-weighted mean squared error (MSE) between the target atmospheric state
|
130 |
+
and prediction.
|
131 |
+
|
132 |
+
- **Loss scaling:** A loss scaling is applied for each output variable. The scaling was chosen empirically such that
|
133 |
+
all prognostic variables have roughly equal contributions to the loss, with the exception of the vertical velocities,
|
134 |
+
for which the weight was reduced. The loss weights also decrease linearly with height, which means that levels in
|
135 |
+
the upper atmosphere (e.g., 50 hPa) contribute relatively little to the total loss value.
|
136 |
|
137 |
+
#### Speeds, Sizes, Times
|
138 |
|
139 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
140 |
|
141 |
+
Data parallelism is used for training, with a batch size of 16. One model instance is split across four 40GB A100
|
142 |
+
GPUs within one node. Training is done using mixed precision (Micikevicius et al. [2018]), and the entire process
|
143 |
+
takes about one week, with 64 GPUs in total.
|
144 |
|
145 |
## Evaluation
|
146 |
|
|
|
206 |
|
207 |
{{ hardware_requirements | default("[More Information Needed]", true)}}
|
208 |
|
209 |
+
We acknowledge PRACE for awarding us access to Leonardo, CINECA, Italy
|
210 |
+
|
211 |
+
|
212 |
#### Software
|
213 |
|
214 |
{{ software | default("[More Information Needed]", true)}}
|
|
|
235 |
```
|
236 |
Lang, S., Alexe, M., Chantry, M., Dramsch, J., Pinault, F., Raoult, B., ... & Rabier, F. (2024). AIFS-ECMWF's data-driven forecasting system. arXiv preprint arXiv:2406.01465.
|
237 |
```
|
|
|
|
|
|
|
|
|
238 |
|
|
|
239 |
|
240 |
## More Information
|
241 |
|