Spaces:
Runtime error
Runtime error
Ludwig Stumpp
commited on
Commit
β’
f3a8621
1
Parent(s):
1d376a9
Remove links in table headers
Browse files
README.md
CHANGED
@@ -1,6 +1,7 @@
|
|
1 |
# π llm-leaderboard
|
2 |
|
3 |
A joint community effort to create one central leaderboard for LLMs. Contributions and corrections welcome!
|
|
|
4 |
|
5 |
## Interactive Dashboard
|
6 |
|
@@ -20,28 +21,28 @@ We are always happy for contributions! You can contribute by the following:
|
|
20 |
|
21 |
## Leaderboard
|
22 |
|
23 |
-
| Model Name |
|
24 |
-
| -------------------------------------------------------------------------------------- |
|
25 |
-
| [alpaca-13b](https://crfm.stanford.edu/2023/03/13/alpaca.html) | [1008](https://lmsys.org/blog/2023-05-03-arena/)
|
26 |
-
| [cerebras-gpt-7b](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) |
|
27 |
-
| [cerebras-gpt-13b](https://huggingface.co/cerebras/Cerebras-GPT-13B) |
|
28 |
-
| [chatglm-6b](https://chatglm.cn/blog) | [985](https://lmsys.org/blog/2023-05-03-arena/)
|
29 |
-
| [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) | [944](https://lmsys.org/blog/2023-05-03-arena/)
|
30 |
-
| [eleuther-pythia-7b](https://huggingface.co/EleutherAI/pythia-6.9b) |
|
31 |
-
| [eleuther-pythia-12b](https://huggingface.co/EleutherAI/pythia-12b) |
|
32 |
-
| [fastchat-t5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0) | [951](https://lmsys.org/blog/2023-05-03-arena/)
|
33 |
-
| [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) |
|
34 |
-
| [gptj-6b](https://huggingface.co/EleutherAI/gpt-j-6b) |
|
35 |
-
| [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/) | [1082](https://lmsys.org/blog/2023-05-03-arena/)
|
36 |
-
| [llama-7b](https://arxiv.org/abs/2302.13971) |
|
37 |
-
| [llama-13b](https://arxiv.org/abs/2302.13971) | [932](https://lmsys.org/blog/2023-05-03-arena/)
|
38 |
-
| [mpt-7b](https://huggingface.co/mosaicml/mpt-7b) |
|
39 |
-
| [oasst-pythia-12b](https://huggingface.co/OpenAssistant/pythia-12b-pre-v8-12.5k-steps) | [1065](https://lmsys.org/blog/2023-05-03-arena/)
|
40 |
-
| [opt-7b](https://huggingface.co/facebook/opt-6.7b) |
|
41 |
-
| [opt-13b](https://huggingface.co/facebook/opt-13b) |
|
42 |
-
| [stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b) |
|
43 |
-
| [stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b) | [858](https://lmsys.org/blog/2023-05-03-arena/)
|
44 |
-
| [vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0) | [1169](https://lmsys.org/blog/2023-05-03-arena/)
|
45 |
|
46 |
## Benchmarks
|
47 |
|
|
|
1 |
# π llm-leaderboard
|
2 |
|
3 |
A joint community effort to create one central leaderboard for LLMs. Contributions and corrections welcome!
|
4 |
+
Sources for the numbers are
|
5 |
|
6 |
## Interactive Dashboard
|
7 |
|
|
|
21 |
|
22 |
## Leaderboard
|
23 |
|
24 |
+
| Model Name | Chatbot Arena Elo | LAMBADA (zero-shot) | TriviaQA (zero-shot) |
|
25 |
+
| -------------------------------------------------------------------------------------- | ------------------------------------------------ | --------------------------------------------- | --------------------------------------------- |
|
26 |
+
| [alpaca-13b](https://crfm.stanford.edu/2023/03/13/alpaca.html) | [1008](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
27 |
+
| [cerebras-gpt-7b](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) | | [0.636](https://www.mosaicml.com/blog/mpt-7b) | [0.141](https://www.mosaicml.com/blog/mpt-7b) |
|
28 |
+
| [cerebras-gpt-13b](https://huggingface.co/cerebras/Cerebras-GPT-13B) | | [0.635](https://www.mosaicml.com/blog/mpt-7b) | [0.146](https://www.mosaicml.com/blog/mpt-7b) |
|
29 |
+
| [chatglm-6b](https://chatglm.cn/blog) | [985](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
30 |
+
| [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) | [944](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
31 |
+
| [eleuther-pythia-7b](https://huggingface.co/EleutherAI/pythia-6.9b) | | [0.667](https://www.mosaicml.com/blog/mpt-7b) | [0.198](https://www.mosaicml.com/blog/mpt-7b) |
|
32 |
+
| [eleuther-pythia-12b](https://huggingface.co/EleutherAI/pythia-12b) | | [0.704](https://www.mosaicml.com/blog/mpt-7b) | [0.233](https://www.mosaicml.com/blog/mpt-7b) |
|
33 |
+
| [fastchat-t5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0) | [951](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
34 |
+
| [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) | | [0.719](https://www.mosaicml.com/blog/mpt-7b) | [0.347](https://www.mosaicml.com/blog/mpt-7b) |
|
35 |
+
| [gptj-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | | [0.683](https://www.mosaicml.com/blog/mpt-7b) | [0.234](https://www.mosaicml.com/blog/mpt-7b) |
|
36 |
+
| [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/) | [1082](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
37 |
+
| [llama-7b](https://arxiv.org/abs/2302.13971) | | [0.738](https://www.mosaicml.com/blog/mpt-7b) | [0.443](https://www.mosaicml.com/blog/mpt-7b) |
|
38 |
+
| [llama-13b](https://arxiv.org/abs/2302.13971) | [932](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
39 |
+
| [mpt-7b](https://huggingface.co/mosaicml/mpt-7b) | | [0.702](https://www.mosaicml.com/blog/mpt-7b) | [0.343](https://www.mosaicml.com/blog/mpt-7b) |
|
40 |
+
| [oasst-pythia-12b](https://huggingface.co/OpenAssistant/pythia-12b-pre-v8-12.5k-steps) | [1065](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
41 |
+
| [opt-7b](https://huggingface.co/facebook/opt-6.7b) | | [0.677](https://www.mosaicml.com/blog/mpt-7b) | [0.227](https://www.mosaicml.com/blog/mpt-7b) |
|
42 |
+
| [opt-13b](https://huggingface.co/facebook/opt-13b) | | [0.692](https://www.mosaicml.com/blog/mpt-7b) | [0.282](https://www.mosaicml.com/blog/mpt-7b) |
|
43 |
+
| [stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b) | | [0.533](https://www.mosaicml.com/blog/mpt-7b) | [0.049](https://www.mosaicml.com/blog/mpt-7b) |
|
44 |
+
| [stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b) | [858](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
45 |
+
| [vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0) | [1169](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
46 |
|
47 |
## Benchmarks
|
48 |
|