update readme
Browse files
README.md
CHANGED
@@ -106,7 +106,7 @@ We then apply the cross entropy loss by comparing with true pairs.
|
|
106 |
|
107 |
#### Hyper parameters
|
108 |
|
109 |
-
We trained ou model on a TPU v3-8. We train the model during 920k steps using a batch size of
|
110 |
We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with
|
111 |
a 2e-5 learning rate. The full training script is accessible in this current repository: `train_script.py`.
|
112 |
|
|
|
106 |
|
107 |
#### Hyper parameters
|
108 |
|
109 |
+
We trained ou model on a TPU v3-8. We train the model during 920k steps using a batch size of 512 (64 per TPU core).
|
110 |
We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with
|
111 |
a 2e-5 learning rate. The full training script is accessible in this current repository: `train_script.py`.
|
112 |
|