Update README.md
Browse files
README.md
CHANGED
@@ -23,7 +23,7 @@ GPT-Neo 125M was trained on the Pile, a large scale curated dataset created by E
|
|
23 |
|
24 |
## Training procedure
|
25 |
|
26 |
-
This model was trained for 572,300 steps
|
27 |
|
28 |
## Intended Use and Limitations
|
29 |
|
|
|
23 |
|
24 |
## Training procedure
|
25 |
|
26 |
+
This model was trained on the Pile for 300 billion tokens over 572,300 steps. It was trained as a masked autoregressive language model, using cross-entropy loss.
|
27 |
|
28 |
## Intended Use and Limitations
|
29 |
|