KBlueLeaf commited on
Commit
4756440
1 Parent(s): 15d597b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -29,7 +29,22 @@ https://github.com/KohakuBlueleaf/z-tipo-extension
29
  ## Model arch and Training
30
  This model is LLaMA arch with 500M parameters, the training data is combined version of Danbooru2023, GBC10M and Coyo-HD-11M.<br>
31
  The total token seen is around 30B tokens.<br>
32
- For more information please refer to the tech report.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ### Evaluation
35
  We have tested TIPO in several metric:
 
29
  ## Model arch and Training
30
  This model is LLaMA arch with 500M parameters, the training data is combined version of Danbooru2023, GBC10M and Coyo-HD-11M.<br>
31
  The total token seen is around 30B tokens.<br>
32
+ For more information please refer to the tech report and following table.
33
+
34
+ | | TIPO-200M | TIPO-500M |
35
+ | ----------------- | ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------ |
36
+ | Arch | LLaMA | LLaMA |
37
+ | Max ctx length | 1024 | 1024 |
38
+ | Batch Size | 2048 | 3584 |
39
+ | Training dataset | Danbooru, GBC10M, 5epoch<br />Danbooru, GBC10M, Coyo11M, 3epoch | Danbooru, GBC10M, Coyo11M, 5epoch |
40
+ | Real Token Seen* | 40B token | 30B token |
41
+ | Training Hardware | RTX 3090 x 4 | H100 x 8 |
42
+ | Training Time | 420 hour` | 100 hour` |
43
+ | URL | [KBlueLeaf/TIPO-200M · Hugging Face](https://huggingface.co/KBlueLeaf/TIPO-200M) | [KBlueLeaf/TIPO-500M · Hugging Face](https://huggingface.co/KBlueLeaf/TIPO-500M) |
44
+
45
+ *: We only count "non-padding token" in the token seen, since all the training data have very large length range.`<br/>`
46
+ `: Since the training data is pretty short, it cost more time to reach same token seen than general LLM pretraining.<br/>``
47
+ As reference, with 4096 as max ctx length and almost all the data have reach that length, you may only need 2days to reach 10B token seen on RTX 3090 x 4 with 200M model.
48
 
49
  ### Evaluation
50
  We have tested TIPO in several metric: