ecmwf
/

aifs-single

anaprietonem commited on Nov 13, 2024

Commit

987b9ce

•

1 Parent(s): b8aa337

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -166,7 +166,7 @@ the upper atmosphere (e.g., 50 hPa) contribute relatively little to the total lo
 Data parallelism is used for training, with a batch size of 16. One model instance is split across four 40GB A100
 GPUs within one node. Training is done using mixed precision (Micikevicius et al. [2018]), and the entire process
-takes about one week, with 64 GPUs in total. The checkpoint size is 1.19 GB and it does not include the optimizer
 state.
 ## Evaluation

 Data parallelism is used for training, with a batch size of 16. One model instance is split across four 40GB A100
 GPUs within one node. Training is done using mixed precision (Micikevicius et al. [2018]), and the entire process
+takes about one week, with 64 GPUs in total. The checkpoint size is 1.19 GB and as mentioned above, it does not include the optimizer
 state.
 ## Evaluation