Update README.md
Browse files
README.md
CHANGED
@@ -29,7 +29,22 @@ https://github.com/KohakuBlueleaf/z-tipo-extension
|
|
29 |
## Model arch and Training
|
30 |
This model is LLaMA arch with 500M parameters, the training data is combined version of Danbooru2023, GBC10M and Coyo-HD-11M.<br>
|
31 |
The total token seen is around 30B tokens.<br>
|
32 |
-
For more information please refer to the tech report.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
|
34 |
### Evaluation
|
35 |
We have tested TIPO in several metric:
|
|
|
29 |
## Model arch and Training
|
30 |
This model is LLaMA arch with 500M parameters, the training data is combined version of Danbooru2023, GBC10M and Coyo-HD-11M.<br>
|
31 |
The total token seen is around 30B tokens.<br>
|
32 |
+
For more information please refer to the tech report and following table.
|
33 |
+
|
34 |
+
| | TIPO-200M | TIPO-500M |
|
35 |
+
| ----------------- | ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------ |
|
36 |
+
| Arch | LLaMA | LLaMA |
|
37 |
+
| Max ctx length | 1024 | 1024 |
|
38 |
+
| Batch Size | 2048 | 3584 |
|
39 |
+
| Training dataset | Danbooru, GBC10M, 5epoch<br />Danbooru, GBC10M, Coyo11M, 3epoch | Danbooru, GBC10M, Coyo11M, 5epoch |
|
40 |
+
| Real Token Seen* | 40B token | 30B token |
|
41 |
+
| Training Hardware | RTX 3090 x 4 | H100 x 8 |
|
42 |
+
| Training Time | 420 hour` | 100 hour` |
|
43 |
+
| URL | [KBlueLeaf/TIPO-200M · Hugging Face](https://huggingface.co/KBlueLeaf/TIPO-200M) | [KBlueLeaf/TIPO-500M · Hugging Face](https://huggingface.co/KBlueLeaf/TIPO-500M) |
|
44 |
+
|
45 |
+
*: We only count "non-padding token" in the token seen, since all the training data have very large length range.`<br/>`
|
46 |
+
`: Since the training data is pretty short, it cost more time to reach same token seen than general LLM pretraining.<br/>``
|
47 |
+
As reference, with 4096 as max ctx length and almost all the data have reach that length, you may only need 2days to reach 10B token seen on RTX 3090 x 4 with 200M model.
|
48 |
|
49 |
### Evaluation
|
50 |
We have tested TIPO in several metric:
|