nicholasKluge
/

TeenyTinyLlama-160m

@@ -170,16 +170,16 @@ for i, completion in enumerate(completions):
 | Models                                                                              | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
 |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
 | [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m)     | 33.01   | 29.40                                   | 33.00                                         | 28.55                                    | 41.10                                          |
 | [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m)     | 31.16   | 26.15                                   | 29.29                                         | 28.11                                    | 41.12                                          |
 | [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped)*               | 31.16   | 24.06                                   | 31.39                                         | 24.86                                    | 44.34                                          |
 | [OPT-125m](https://huggingface.co/facebook/opt-125m)*                               | 30.80   | 22.87                                   | 31.47                                         | 26.02                                    | 42.87                                          |
 | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22   | 22.48                                   | 29.62                                         | 27.36                                    | 41.44                                          |
 | [Gpt2-small](https://huggingface.co/gpt2)*                                          | 29.97   | 21.48                                   | 31.60                                         | 25.79                                    | 40.65                                          |
-| [Xglm-564M](https://huggingface.co/facebook/xglm-564M)*                             | 31.20   | 24.57                                   | 34.64                                         | 25.18                                    | 40.43                                          |
-| [Bloom-560m](https://huggingface.co/bigscience/bloom-560m)*                         | 32.13   | 24.74                                   | 37.15                                         | 24.22                                    | 42.44                                          |
 | [Multilingual GPT](https://huggingface.co/ai-forever/mGPT)*                         | 28.73   | 23.81                                   | 26.37                                         | 25.17                                    | 39.62                                          |
-- Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness. The results of models marked with an "*" were retirved from the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
 ## Fine-Tuning Comparisons

 | Models                                                                              | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
 |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
 | [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m)     | 33.01   | 29.40                                   | 33.00                                         | 28.55                                    | 41.10                                          |
+| [Bloom-560m](https://huggingface.co/bigscience/bloom-560m)*                         | 32.13   | 24.74                                   | 37.15                                         | 24.22                                    | 42.44                                          |
+| [Xglm-564M](https://huggingface.co/facebook/xglm-564M)                              | 31.97   | 25.56                                   | 34.64*                                        | 25.18*                                   | 42.53                                          |
 | [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m)     | 31.16   | 26.15                                   | 29.29                                         | 28.11                                    | 41.12                                          |
 | [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped)*               | 31.16   | 24.06                                   | 31.39                                         | 24.86                                    | 44.34                                          |
 | [OPT-125m](https://huggingface.co/facebook/opt-125m)*                               | 30.80   | 22.87                                   | 31.47                                         | 26.02                                    | 42.87                                          |
 | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22   | 22.48                                   | 29.62                                         | 27.36                                    | 41.44                                          |
 | [Gpt2-small](https://huggingface.co/gpt2)*                                          | 29.97   | 21.48                                   | 31.60                                         | 25.79                                    | 40.65                                          |
 | [Multilingual GPT](https://huggingface.co/ai-forever/mGPT)*                         | 28.73   | 23.81                                   | 26.37                                         | 25.17                                    | 39.62                                          |
+- Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness. The results of models marked with an "*" were extracted from the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
 ## Fine-Tuning Comparisons