Update README.md
Browse files
README.md
CHANGED
@@ -15,9 +15,6 @@ We also included [ultrachat](https://huggingface.co/datasets/stingning/ultrachat
|
|
15 |
|
16 |
We trained for 6 epochs, resulting in a model trained on 180B tokens with a sequence length of 2k, a global batch size of 1.3M tokens and a learning rate of 3e-4 with a cosine schedule for 14àk steps.
|
17 |
We used the tokenizer from [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1/).
|
18 |
-
The training loss:
|
19 |
-
|
20 |
-

|
21 |
|
22 |
# How to use
|
23 |
|
@@ -82,3 +79,6 @@ Thsi is a small 1.8B model trained on synthetic data, so it might hallucinate, g
|
|
82 |
- **GPUs:** 160 H100
|
83 |
- **Training time:** 15hours
|
84 |
|
|
|
|
|
|
|
|
15 |
|
16 |
We trained for 6 epochs, resulting in a model trained on 180B tokens with a sequence length of 2k, a global batch size of 1.3M tokens and a learning rate of 3e-4 with a cosine schedule for 14àk steps.
|
17 |
We used the tokenizer from [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1/).
|
|
|
|
|
|
|
18 |
|
19 |
# How to use
|
20 |
|
|
|
79 |
- **GPUs:** 160 H100
|
80 |
- **Training time:** 15hours
|
81 |
|
82 |
+
The training loss:
|
83 |
+
|
84 |
+

|