Adding Evaluation Results
#1
by
leaderboard-pr-bot
- opened
README.md
CHANGED
@@ -1,18 +1,13 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
pretty_name: Llama-3.1 8B German ORPO
|
6 |
-
model_type: llama3.1
|
7 |
-
prompt_template: >-
|
8 |
-
Below is an instruction that describes a task. Write a response that
|
9 |
-
appropriately completes the request. ### Instruction: {Instruction} {summary} ### input: {category} ### Response: {prompt}
|
10 |
-
library_name: peft
|
11 |
license: llama3.1
|
12 |
-
|
13 |
tags:
|
14 |
- llama-factory
|
15 |
- lora
|
|
|
16 |
datasets:
|
17 |
- mayflowergmbh/intel_orca_dpo_pairs_de
|
18 |
- LeoLM/OpenSchnabeltier
|
@@ -28,15 +23,19 @@ datasets:
|
|
28 |
- mayflowergmbh/alpaca-gpt4_de
|
29 |
- mayflowergmbh/dolly-15k_de
|
30 |
- mayflowergmbh/oasst_de
|
31 |
-
|
32 |
-
|
33 |
-
-
|
|
|
|
|
|
|
|
|
34 |
pipeline_tag: text-generation
|
35 |
task_categories:
|
36 |
- question-answering
|
37 |
- text2text-generation
|
38 |
- conversational
|
39 |
-
inference:
|
40 |
model-index:
|
41 |
- name: Llama-3.1-8B-German-ORPO
|
42 |
results: []
|
@@ -190,4 +189,17 @@ Note: Output from inference [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Fac
|
|
190 |
| Llama-3.1-8B-Instruct-exl2-8bpw-h8 | 46.16 | 63.74 | 49.68 | 36.93 | 48.29 | 55.81 | 28.59 | 52.81 | 45.67 | 30.79 | 45.08 | 40.48 | 39.03 | 60.90 | 48.38 |
|
191 |
|
192 |
Note: Lower on Benchmark for **English**, seems to be degraded as trade-off. Not frequently but the output repeats sentences (because of the wrong chat template).
|
193 |
-
</details>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- de
|
4 |
+
- en
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
license: llama3.1
|
6 |
+
library_name: peft
|
7 |
tags:
|
8 |
- llama-factory
|
9 |
- lora
|
10 |
+
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
|
11 |
datasets:
|
12 |
- mayflowergmbh/intel_orca_dpo_pairs_de
|
13 |
- LeoLM/OpenSchnabeltier
|
|
|
23 |
- mayflowergmbh/alpaca-gpt4_de
|
24 |
- mayflowergmbh/dolly-15k_de
|
25 |
- mayflowergmbh/oasst_de
|
26 |
+
model_creator: Nekochu
|
27 |
+
quantized_by: Nekochu
|
28 |
+
pretty_name: Llama-3.1 8B German ORPO
|
29 |
+
model_type: llama3.1
|
30 |
+
prompt_template: 'Below is an instruction that describes a task. Write a response
|
31 |
+
that appropriately completes the request. ### Instruction: {Instruction} {summary}
|
32 |
+
### input: {category} ### Response: {prompt}'
|
33 |
pipeline_tag: text-generation
|
34 |
task_categories:
|
35 |
- question-answering
|
36 |
- text2text-generation
|
37 |
- conversational
|
38 |
+
inference: true
|
39 |
model-index:
|
40 |
- name: Llama-3.1-8B-German-ORPO
|
41 |
results: []
|
|
|
189 |
| Llama-3.1-8B-Instruct-exl2-8bpw-h8 | 46.16 | 63.74 | 49.68 | 36.93 | 48.29 | 55.81 | 28.59 | 52.81 | 45.67 | 30.79 | 45.08 | 40.48 | 39.03 | 60.90 | 48.38 |
|
190 |
|
191 |
Note: Lower on Benchmark for **English**, seems to be degraded as trade-off. Not frequently but the output repeats sentences (because of the wrong chat template).
|
192 |
+
</details>
|
193 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
194 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Nekochu__Llama-3.1-8B-German-ORPO)
|
195 |
+
|
196 |
+
| Metric |Value|
|
197 |
+
|-------------------|----:|
|
198 |
+
|Avg. |21.30|
|
199 |
+
|IFEval (0-Shot) |46.11|
|
200 |
+
|BBH (3-Shot) |29.42|
|
201 |
+
|MATH Lvl 5 (4-Shot)| 0.00|
|
202 |
+
|GPQA (0-shot) | 8.84|
|
203 |
+
|MuSR (0-shot) |16.86|
|
204 |
+
|MMLU-PRO (5-shot) |26.59|
|
205 |
+
|