sarath-shekkizhar
commited on
Commit
•
0ff1c54
1
Parent(s):
7307326
Update README.md
Browse files
README.md
CHANGED
@@ -96,6 +96,17 @@ MT-Bench is a benchmark made up of 80 high-quality multi-turn questions. These q
|
|
96 |
|
97 |
![hexplot.png](assets/hexplot.png)
|
98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
## LM Evaluation - Open LLM Leaderboard
|
100 |
|
101 |
We assess models on 7 benchmarks using the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness). This setup is based of that used for [Open LLM Leaderboard.](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
|
|
96 |
|
97 |
![hexplot.png](assets/hexplot.png)
|
98 |
|
99 |
+
### Comparison with additional Open LLM LeaderBoard models
|
100 |
+
| Model | First Turn | Second Turn | Average |
|
101 |
+
| --- | --- | --- | --- |
|
102 |
+
| TenyxChat-7B-v1 | 8.45000 | 7.756250 | 8.103125 |
|
103 |
+
| SamirGPT-v1 | 8.05000 | 7.612500 | 7.831250 |
|
104 |
+
| FernandoGPT-v1 | 8.08125 | 7.256250 | 7.668750 |
|
105 |
+
| Go-Bruins-v2 | 8.13750 | 7.150000 | 7.643750 |
|
106 |
+
| mistral_tv-neural-marconroni | 7.76875 | 6.987500 | 7.378125 |
|
107 |
+
| neuronovo-7B-v0.2 | 7.73750 | 6.662500 | 7.200000 |
|
108 |
+
| neural-chat-7b-v3-3 | 7.39375 | 5.881250 | 6.637500 |
|
109 |
+
|
110 |
## LM Evaluation - Open LLM Leaderboard
|
111 |
|
112 |
We assess models on 7 benchmarks using the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness). This setup is based of that used for [Open LLM Leaderboard.](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|