nicholasKluge
commited on
Commit
•
45c0ff2
1
Parent(s):
797b184
Update README.md
Browse files
README.md
CHANGED
@@ -171,13 +171,14 @@ for i, completion in enumerate(completions):
|
|
171 |
| Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
|
172 |
|-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
|
173 |
| [TeenyTinyLlama-162m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-162m) | 31.16 | 26.15 | 29.29 | 28.11 | 41.12 |
|
174 |
-
| [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped)
|
175 |
-
| [OPT-125m](https://huggingface.co/facebook/opt-125m)
|
176 |
| [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48 | 29.62 | 27.36 | 41.44 |
|
177 |
-
| [Gpt2-small](https://huggingface.co/gpt2)
|
178 |
-
| [Xglm-564M](https://huggingface.co/facebook/xglm-564M)
|
|
|
179 |
|
180 |
-
* Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.
|
181 |
|
182 |
## Fine-Tuning Comparisons
|
183 |
|
|
|
171 |
| Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
|
172 |
|-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
|
173 |
| [TeenyTinyLlama-162m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-162m) | 31.16 | 26.15 | 29.29 | 28.11 | 41.12 |
|
174 |
+
| [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped)* | 31.16 | 24.06 | 31.39 | 24.86 | 44.34 |
|
175 |
+
| [OPT-125m](https://huggingface.co/facebook/opt-125m)* | 30.80 | 22.87 | 31.47 | 26.02 | 42.87 |
|
176 |
| [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48 | 29.62 | 27.36 | 41.44 |
|
177 |
+
| [Gpt2-small](https://huggingface.co/gpt2)8 | 29.97 | 21.48 | 31.60 | 25.79 | 40.65 |
|
178 |
+
| [Xglm-564M](https://huggingface.co/facebook/xglm-564M)* | 31.20 | 24.57 | 34.64 | 25.18 | 40.43 |
|
179 |
+
| [Bloom-560m](https://huggingface.co/bigscience/bloom-560m)* | 32.13 | 24.74 | 37.15 | 24.22 | 42.44 |
|
180 |
|
181 |
+
* Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness. The results of models marked with an "*" were retirved from the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
182 |
|
183 |
## Fine-Tuning Comparisons
|
184 |
|