Adding Evaluation Results

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +27 -15

README.md CHANGED Viewed

@@ -1,18 +1,13 @@
 ---
-model_creator: Nekochu
-quantized_by: Nekochu
-model_name: Llama-3.1 8B German ORPO
-pretty_name: Llama-3.1 8B German ORPO
-model_type: llama3.1
-prompt_template: >-
-  Below is an instruction that describes a task. Write a response that
-  appropriately completes the request. ### Instruction: {Instruction} {summary} ### input: {category} ### Response: {prompt}
-library_name: peft
 license: llama3.1
-base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
 tags:
 - llama-factory
 - lora
 datasets:
 - mayflowergmbh/intel_orca_dpo_pairs_de
 - LeoLM/OpenSchnabeltier
@@ -28,15 +23,19 @@ datasets:
 - mayflowergmbh/alpaca-gpt4_de
 - mayflowergmbh/dolly-15k_de
 - mayflowergmbh/oasst_de
-language:
-- de
-- en
 pipeline_tag: text-generation
 task_categories:
 - question-answering
 - text2text-generation
 - conversational
-inference: True
 model-index:
 - name: Llama-3.1-8B-German-ORPO
   results: []
@@ -190,4 +189,17 @@ Note: Output from inference [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Fac
 | Llama-3.1-8B-Instruct-exl2-8bpw-h8 | 46.16                | 63.74   | 49.68    | 36.93     | 48.29            | 55.81     | 28.59       | 52.81  | 45.67   | 30.79 | 45.08 | 40.48      | 39.03   | 60.90      | 48.38 |
 Note: Lower on Benchmark for **English**, seems to be degraded as trade-off. Not frequently but the output repeats sentences (because of the wrong chat template).
-</details>

 ---
+language:
+- de
+- en
 license: llama3.1
+library_name: peft
 tags:
 - llama-factory
 - lora
+base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
 datasets:
 - mayflowergmbh/intel_orca_dpo_pairs_de
 - LeoLM/OpenSchnabeltier
 - mayflowergmbh/alpaca-gpt4_de
 - mayflowergmbh/dolly-15k_de
 - mayflowergmbh/oasst_de
+model_creator: Nekochu
+quantized_by: Nekochu
+pretty_name: Llama-3.1 8B German ORPO
+model_type: llama3.1
+prompt_template: 'Below is an instruction that describes a task. Write a response
+  that appropriately completes the request. ### Instruction: {Instruction} {summary}
+  ### input: {category} ### Response: {prompt}'
 pipeline_tag: text-generation
 task_categories:
 - question-answering
 - text2text-generation
 - conversational
+inference: true
 model-index:
 - name: Llama-3.1-8B-German-ORPO
   results: []
 | Llama-3.1-8B-Instruct-exl2-8bpw-h8 | 46.16                | 63.74   | 49.68    | 36.93     | 48.29            | 55.81     | 28.59       | 52.81  | 45.67   | 30.79 | 45.08 | 40.48      | 39.03   | 60.90      | 48.38 |
 Note: Lower on Benchmark for **English**, seems to be degraded as trade-off. Not frequently but the output repeats sentences (because of the wrong chat template).
+</details>
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Nekochu__Llama-3.1-8B-German-ORPO)
+|      Metric       |Value|
+|-------------------|----:|
+|Avg.               |21.30|
+|IFEval (0-Shot)    |46.11|
+|BBH (3-Shot)       |29.42|
+|MATH Lvl 5 (4-Shot)| 0.00|
+|GPQA (0-shot)      | 8.84|
+|MuSR (0-shot)      |16.86|
+|MMLU-PRO (5-shot)  |26.59|