nicholasKluge
/

TeenyTinyLlama-160m

@@ -171,13 +171,14 @@ for i, completion in enumerate(completions):
 | Models                                                                              | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
 |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
 | [TeenyTinyLlama-162m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-162m)     | 31.16   | 26.15                                   | 29.29                                         | 28.11                                    | 41.12                                          |
-| [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped)                | 31.16   | 24.06                                   | 31.39                                         | 24.86                                    | 44.34                                          |
-| [OPT-125m](https://huggingface.co/facebook/opt-125m)                                | 30.80   | 22.87                                   | 31.47                                         | 26.02                                    | 42.87                                          |
 | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22   | 22.48                                   | 29.62                                         | 27.36                                    | 41.44                                          |
-| [Gpt2-small](https://huggingface.co/gpt2)                                           | 29.97   | 21.48                                   | 31.60                                         | 25.79                                    | 40.65                                          |
-| [Xglm-564M](https://huggingface.co/facebook/xglm-564M)                              | 31.20   | 24.57                                   | 34.64                                         | 25.18                                    | 40.43                                          |
-* Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.
 ## Fine-Tuning Comparisons

 | Models                                                                              | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
 |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
 | [TeenyTinyLlama-162m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-162m)     | 31.16   | 26.15                                   | 29.29                                         | 28.11                                    | 41.12                                          |
+| [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped)*               | 31.16   | 24.06                                   | 31.39                                         | 24.86                                    | 44.34                                          |
+| [OPT-125m](https://huggingface.co/facebook/opt-125m)*                               | 30.80   | 22.87                                   | 31.47                                         | 26.02                                    | 42.87                                          |
 | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22   | 22.48                                   | 29.62                                         | 27.36                                    | 41.44                                          |
+| [Gpt2-small](https://huggingface.co/gpt2)8                                          | 29.97   | 21.48                                   | 31.60                                         | 25.79                                    | 40.65                                          |
+| [Xglm-564M](https://huggingface.co/facebook/xglm-564M)*                             | 31.20   | 24.57                                   | 34.64                                         | 25.18                                    | 40.43                                          |
+| [Bloom-560m](https://huggingface.co/bigscience/bloom-560m)*                         | 32.13   | 24.74                                   | 37.15                                         | 24.22                                    | 42.44                                          |
+* Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness. The results of models marked with an "*" were retirved from the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
 ## Fine-Tuning Comparisons