anaprietonem
commited on
Commit
•
987b9ce
1
Parent(s):
b8aa337
Update README.md
Browse files
README.md
CHANGED
@@ -166,7 +166,7 @@ the upper atmosphere (e.g., 50 hPa) contribute relatively little to the total lo
|
|
166 |
|
167 |
Data parallelism is used for training, with a batch size of 16. One model instance is split across four 40GB A100
|
168 |
GPUs within one node. Training is done using mixed precision (Micikevicius et al. [2018]), and the entire process
|
169 |
-
takes about one week, with 64 GPUs in total. The checkpoint size is 1.19 GB and it does not include the optimizer
|
170 |
state.
|
171 |
|
172 |
## Evaluation
|
|
|
166 |
|
167 |
Data parallelism is used for training, with a batch size of 16. One model instance is split across four 40GB A100
|
168 |
GPUs within one node. Training is done using mixed precision (Micikevicius et al. [2018]), and the entire process
|
169 |
+
takes about one week, with 64 GPUs in total. The checkpoint size is 1.19 GB and as mentioned above, it does not include the optimizer
|
170 |
state.
|
171 |
|
172 |
## Evaluation
|