nreimers commited on
Commit
278590c
1 Parent(s): ac8d0d1

update readme

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -106,7 +106,7 @@ We then apply the cross entropy loss by comparing with true pairs.
106
 
107
  #### Hyper parameters
108
 
109
- We trained ou model on a TPU v3-8. We train the model during 920k steps using a batch size of 1024 (128 per TPU core).
110
  We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with
111
  a 2e-5 learning rate. The full training script is accessible in this current repository: `train_script.py`.
112
 
 
106
 
107
  #### Hyper parameters
108
 
109
+ We trained ou model on a TPU v3-8. We train the model during 920k steps using a batch size of 512 (64 per TPU core).
110
  We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with
111
  a 2e-5 learning rate. The full training script is accessible in this current repository: `train_script.py`.
112