update readme
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ license: mit
|
|
16 |
|
17 |
# LiteLLama: Reduced-Scale, Experimental Versions of Llama
|
18 |
|
19 |
-
In this series of repos, we present an open-source reproduction of Meta AI's [
|
20 |
|
21 |
|
22 |
## Dataset and Tokenization
|
@@ -26,7 +26,7 @@ We train our models on part of [RedPajama](https://www.together.xyz/blog/redpaja
|
|
26 |
|
27 |
The model was trained with ~1T tokens (0.98T). num of tokens = steps*length*batch_size=499679*1024*192=98240888832≈0.98T.
|
28 |
|
29 |
-
The training curve is at https://wandb.ai/ahxt/llama2_xs_460M_training_loss/reports/reduced_train_loss-23-09-05-20-25-43---Vmlldzo1MzIwNDUx?accessToken=x2ch3n30jo77p1x8y7q9js4h4d8zpjtz1tzot4xxullyefixp4jwt7au2q37k2q6
|
30 |
|
31 |
### Using with HuggingFace Transformers
|
32 |
The experimental checkpoints can be directly loaded by [Transformers](https://huggingface.co/transformers/) library. The following code snippet shows how to load the our experimental model and generate text with it.
|
@@ -51,7 +51,7 @@ print( tokenizer.decode(tokens[0].tolist(), skip_special_tokens=True) )
|
|
51 |
|
52 |
## Evaluation
|
53 |
|
54 |
-
|
55 |
|
56 |
| Models | #parameters |zero-shot | 5-shot |
|
57 |
| --- | --- | --- | --- |
|
@@ -61,7 +61,7 @@ print( tokenizer.decode(tokens[0].tolist(), skip_special_tokens=True) )
|
|
61 |
| LiteLlama-460M-1T | 0.46B | 21.13 | 26.39 |
|
62 |
|
63 |
|
64 |
-
|
65 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ahxt__llama2_xs_460M_experimental)
|
66 |
|
67 |
| Metric | Value |
|
@@ -79,8 +79,7 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
|
|
79 |
|
80 |
|
81 |
## Contact
|
82 |
-
This
|
83 |
-
[Xiaotian Han](https://ahxt.github.io/) from Texas A&M University. The model is released
|
84 |
|
85 |
|
86 |
|
|
|
16 |
|
17 |
# LiteLLama: Reduced-Scale, Experimental Versions of Llama
|
18 |
|
19 |
+
In this series of repos, we present an open-source reproduction of Meta AI's [LLaMa 2](https://ai.meta.com/llama/). However, with significantly reduced model sizes, [LiteLlama-460M-1T](https://huggingface.co/ahxt/LiteLlama-460M-1T) has 460M parameters trained with 1T tokens.
|
20 |
|
21 |
|
22 |
## Dataset and Tokenization
|
|
|
26 |
|
27 |
The model was trained with ~1T tokens (0.98T). num of tokens = steps*length*batch_size=499679*1024*192=98240888832≈0.98T.
|
28 |
|
29 |
+
The training curve is at this [WandB project](https://wandb.ai/ahxt/llama2_xs_460M_training_loss/reports/reduced_train_loss-23-09-05-20-25-43---Vmlldzo1MzIwNDUx?accessToken=x2ch3n30jo77p1x8y7q9js4h4d8zpjtz1tzot4xxullyefixp4jwt7au2q37k2q6).
|
30 |
|
31 |
### Using with HuggingFace Transformers
|
32 |
The experimental checkpoints can be directly loaded by [Transformers](https://huggingface.co/transformers/) library. The following code snippet shows how to load the our experimental model and generate text with it.
|
|
|
51 |
|
52 |
## Evaluation
|
53 |
|
54 |
+
### We evaluate our models on the MMLU task.
|
55 |
|
56 |
| Models | #parameters |zero-shot | 5-shot |
|
57 |
| --- | --- | --- | --- |
|
|
|
61 |
| LiteLlama-460M-1T | 0.46B | 21.13 | 26.39 |
|
62 |
|
63 |
|
64 |
+
### [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
65 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ahxt__llama2_xs_460M_experimental)
|
66 |
|
67 |
| Metric | Value |
|
|
|
79 |
|
80 |
|
81 |
## Contact
|
82 |
+
This model is developed by [Xiaotian Han](https://ahxt.github.io/) from Texas A&M University and released under MIT License.
|
|
|
83 |
|
84 |
|
85 |
|