Update README.md
Browse files
README.md
CHANGED
@@ -241,7 +241,7 @@ The following hyperparameters were used during training:
|
|
241 |
- distributed_type: multi-GPU
|
242 |
- gradient_accumulation_steps: 16
|
243 |
- total_train_batch_size: 64
|
244 |
-
- optimizer:
|
245 |
- lr_scheduler_type: cosine
|
246 |
- lr_scheduler_warmup_ratio: 0.03
|
247 |
- num_epochs: 2
|
|
|
241 |
- distributed_type: multi-GPU
|
242 |
- gradient_accumulation_steps: 16
|
243 |
- total_train_batch_size: 64
|
244 |
+
- optimizer: _ADAN_ using lucidrains' `adan-pytorch` with default betas
|
245 |
- lr_scheduler_type: cosine
|
246 |
- lr_scheduler_warmup_ratio: 0.03
|
247 |
- num_epochs: 2
|