Graph Machine Learning
AnemoI
English
anaprietonem commited on
Commit
987b9ce
1 Parent(s): b8aa337

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -166,7 +166,7 @@ the upper atmosphere (e.g., 50 hPa) contribute relatively little to the total lo
166
 
167
  Data parallelism is used for training, with a batch size of 16. One model instance is split across four 40GB A100
168
  GPUs within one node. Training is done using mixed precision (Micikevicius et al. [2018]), and the entire process
169
- takes about one week, with 64 GPUs in total. The checkpoint size is 1.19 GB and it does not include the optimizer
170
  state.
171
 
172
  ## Evaluation
 
166
 
167
  Data parallelism is used for training, with a batch size of 16. One model instance is split across four 40GB A100
168
  GPUs within one node. Training is done using mixed precision (Micikevicius et al. [2018]), and the entire process
169
+ takes about one week, with 64 GPUs in total. The checkpoint size is 1.19 GB and as mentioned above, it does not include the optimizer
170
  state.
171
 
172
  ## Evaluation