HuggingFaceTB
/

cosmo-1b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

loubnabnl HF staff commited on Feb 20, 2024

Commit

68d9f23

·

verified ·

1 Parent(s): 09ec5a3

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -15,9 +15,6 @@ We also included [ultrachat](https://huggingface.co/datasets/stingning/ultrachat
 We trained for 6 epochs, resulting in a model trained on 180B tokens with a sequence length of 2k, a global batch size of 1.3M tokens and a learning rate of 3e-4 with a cosine schedule for 14àk steps.
 We used the tokenizer from [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1/).
-The training loss:
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/rJobY7F6tqTAvIox1ZGKR.png)
 # How to use
@@ -82,3 +79,6 @@ Thsi is a small 1.8B model trained on synthetic data, so it might hallucinate, g
 - **GPUs:** 160 H100
 - **Training time:** 15hours

 We trained for 6 epochs, resulting in a model trained on 180B tokens with a sequence length of 2k, a global batch size of 1.3M tokens and a learning rate of 3e-4 with a cosine schedule for 14àk steps.
 We used the tokenizer from [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1/).
 # How to use
 - **GPUs:** 160 H100
 - **Training time:** 15hours
+The training loss:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/rJobY7F6tqTAvIox1ZGKR.png)