eryk-mazus
commited on
Commit
•
ec5ce3f
1
Parent(s):
a937b2a
Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ pipeline_tag: text-generation
|
|
16 |
|
17 |
`polka-1.1b` takes the [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) model and enhances it by continuing pretraining on an additional **5.7 billion Polish tokens**, primarily sourced from the [MADLAD-400](https://arxiv.org/abs/2309.04662) dataset. The tokens were sampled in a 10:1 ratio between Polish and English shards using [DSIR](https://github.com/p-lambda/dsir). Furthermore, Polka extends the TinyLlama tokenizer's vocabulary to 43,882 tokens, improving its efficiency for generating Polish text.
|
18 |
|
19 |
-
The training took 425
|
20 |
|
21 |
## Notes
|
22 |
|
|
|
16 |
|
17 |
`polka-1.1b` takes the [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) model and enhances it by continuing pretraining on an additional **5.7 billion Polish tokens**, primarily sourced from the [MADLAD-400](https://arxiv.org/abs/2309.04662) dataset. The tokens were sampled in a 10:1 ratio between Polish and English shards using [DSIR](https://github.com/p-lambda/dsir). Furthermore, Polka extends the TinyLlama tokenizer's vocabulary to 43,882 tokens, improving its efficiency for generating Polish text.
|
18 |
|
19 |
+
The training took 425 GPU hours on a single 8 x RTX 4090 machine with DeepSpeed ZeRO-2.
|
20 |
|
21 |
## Notes
|
22 |
|