leaderboard-pr-bot commited on
Commit
0a278c6
1 Parent(s): 463ea77

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +27 -15
README.md CHANGED
@@ -1,18 +1,13 @@
1
  ---
2
- model_creator: Nekochu
3
- quantized_by: Nekochu
4
- model_name: Llama-3.1 8B German ORPO
5
- pretty_name: Llama-3.1 8B German ORPO
6
- model_type: llama3.1
7
- prompt_template: >-
8
- Below is an instruction that describes a task. Write a response that
9
- appropriately completes the request. ### Instruction: {Instruction} {summary} ### input: {category} ### Response: {prompt}
10
- library_name: peft
11
  license: llama3.1
12
- base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
13
  tags:
14
  - llama-factory
15
  - lora
 
16
  datasets:
17
  - mayflowergmbh/intel_orca_dpo_pairs_de
18
  - LeoLM/OpenSchnabeltier
@@ -28,15 +23,19 @@ datasets:
28
  - mayflowergmbh/alpaca-gpt4_de
29
  - mayflowergmbh/dolly-15k_de
30
  - mayflowergmbh/oasst_de
31
- language:
32
- - de
33
- - en
 
 
 
 
34
  pipeline_tag: text-generation
35
  task_categories:
36
  - question-answering
37
  - text2text-generation
38
  - conversational
39
- inference: True
40
  model-index:
41
  - name: Llama-3.1-8B-German-ORPO
42
  results: []
@@ -190,4 +189,17 @@ Note: Output from inference [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Fac
190
  | Llama-3.1-8B-Instruct-exl2-8bpw-h8 | 46.16 | 63.74 | 49.68 | 36.93 | 48.29 | 55.81 | 28.59 | 52.81 | 45.67 | 30.79 | 45.08 | 40.48 | 39.03 | 60.90 | 48.38 |
191
 
192
  Note: Lower on Benchmark for **English**, seems to be degraded as trade-off. Not frequently but the output repeats sentences (because of the wrong chat template).
193
- </details>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - de
4
+ - en
 
 
 
 
 
 
5
  license: llama3.1
6
+ library_name: peft
7
  tags:
8
  - llama-factory
9
  - lora
10
+ base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
11
  datasets:
12
  - mayflowergmbh/intel_orca_dpo_pairs_de
13
  - LeoLM/OpenSchnabeltier
 
23
  - mayflowergmbh/alpaca-gpt4_de
24
  - mayflowergmbh/dolly-15k_de
25
  - mayflowergmbh/oasst_de
26
+ model_creator: Nekochu
27
+ quantized_by: Nekochu
28
+ pretty_name: Llama-3.1 8B German ORPO
29
+ model_type: llama3.1
30
+ prompt_template: 'Below is an instruction that describes a task. Write a response
31
+ that appropriately completes the request. ### Instruction: {Instruction} {summary}
32
+ ### input: {category} ### Response: {prompt}'
33
  pipeline_tag: text-generation
34
  task_categories:
35
  - question-answering
36
  - text2text-generation
37
  - conversational
38
+ inference: true
39
  model-index:
40
  - name: Llama-3.1-8B-German-ORPO
41
  results: []
 
189
  | Llama-3.1-8B-Instruct-exl2-8bpw-h8 | 46.16 | 63.74 | 49.68 | 36.93 | 48.29 | 55.81 | 28.59 | 52.81 | 45.67 | 30.79 | 45.08 | 40.48 | 39.03 | 60.90 | 48.38 |
190
 
191
  Note: Lower on Benchmark for **English**, seems to be degraded as trade-off. Not frequently but the output repeats sentences (because of the wrong chat template).
192
+ </details>
193
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
194
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Nekochu__Llama-3.1-8B-German-ORPO)
195
+
196
+ | Metric |Value|
197
+ |-------------------|----:|
198
+ |Avg. |21.30|
199
+ |IFEval (0-Shot) |46.11|
200
+ |BBH (3-Shot) |29.42|
201
+ |MATH Lvl 5 (4-Shot)| 0.00|
202
+ |GPQA (0-shot) | 8.84|
203
+ |MuSR (0-shot) |16.86|
204
+ |MMLU-PRO (5-shot) |26.59|
205
+