something-else
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -24,12 +24,13 @@ tags:
|
|
24 |
- rwkv-v5-stp76-N8.pth : 3B rocm-rwkv model starting with the previous but now with 62 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.780 for N8 and 51.763 GTokens.
|
25 |
- rwkv-v5-stp118-N8.pth : 3B rocm-rwkv model starting with the previous but now with 118 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.750 for N8 and 79.508 GTokens.
|
26 |
- rwkv-v5-stp146-N8.pth : 3B rocm-rwkv model starting with the previous but now with 146 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.758 for N8 and 97.982 GTokens.
|
|
|
27 |
|
28 |
|
29 |
7B rocm-rwkv pth record: I called this model Tlanuwa since I added an extra training focusing on cherokee after each run.
|
30 |
|
31 |
9B rocm-rwkv pth record: 40 layers embd=4096 ctx= 16384 I am calling this model Quetzal. I called this model Quetzal since it is a green model that flies and I am adding an extra training focusing on Spanish and the dataset Axolotl-Spanish-Nahuatl after each run.
|
32 |
- rwkv-9Q-stp101-N8.pth: 9B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-2 and a mix of multi-language and code after that I am using the N8 dataset. I am currendly with the N8 dataset 4.222 GTokes. This pth has a loss of 1.904 regarding the N8 dataset.
|
33 |
-
- rwkv-9Q-1k-stp307-1k-N8.pth: 9B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-2 and a mix of multi-language and code after that I am using the N8 dataset. I am currendly with the N8 dataset 12.706 GTokes. This pth has a loss of 1.871 regarding the N8 dataset.
|
34 |
-
|
35 |
|
|
|
24 |
- rwkv-v5-stp76-N8.pth : 3B rocm-rwkv model starting with the previous but now with 62 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.780 for N8 and 51.763 GTokens.
|
25 |
- rwkv-v5-stp118-N8.pth : 3B rocm-rwkv model starting with the previous but now with 118 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.750 for N8 and 79.508 GTokens.
|
26 |
- rwkv-v5-stp146-N8.pth : 3B rocm-rwkv model starting with the previous but now with 146 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.758 for N8 and 97.982 GTokens.
|
27 |
+
- rwkv-v5-final-N8.pth : 3B rocm-rwkv model starting with the previous but now with the full N8 dataset epoch with --lr_init 3e-8 --lr_final 1e-8 This pth has a loss of 1.73 for the full N8 dataset with 106.098327552 GTokens.
|
28 |
|
29 |
|
30 |
7B rocm-rwkv pth record: I called this model Tlanuwa since I added an extra training focusing on cherokee after each run.
|
31 |
|
32 |
9B rocm-rwkv pth record: 40 layers embd=4096 ctx= 16384 I am calling this model Quetzal. I called this model Quetzal since it is a green model that flies and I am adding an extra training focusing on Spanish and the dataset Axolotl-Spanish-Nahuatl after each run.
|
33 |
- rwkv-9Q-stp101-N8.pth: 9B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-2 and a mix of multi-language and code after that I am using the N8 dataset. I am currendly with the N8 dataset 4.222 GTokes. This pth has a loss of 1.904 regarding the N8 dataset.
|
34 |
+
- rwkv-9Q-1k-stp307-1k-N8.pth: 9B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-2 and a mix of multi-language and code after that I am using the N8 dataset. I am currendly with the N8 dataset 12.706 GTokes. This pth has a loss of 1.871 regarding the N8 dataset.
|
35 |
+
- rwkv-9Q-Soup91-step298.pth : Using the rwkv-9Q-1k-stp307-1k-N8.pth I added 298 epoch steps of my soup of data (code + math+ instruct+ chain of thought) 12.283 Gtokens with a loss of 2.242.
|
36 |
|