MaziyarPanahi commited on
Commit
261c4eb
1 Parent(s): 452d7ee

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +124 -7
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
- base_model: meta-llama/Meta-Llama-3-8B-Instruct
 
 
3
  library_name: transformers
4
  tags:
5
  - axolotl
@@ -9,18 +11,119 @@ tags:
9
  - pytorch
10
  - llama
11
  - llama-3
12
- language:
13
- - en
 
 
14
  pipeline_tag: text-generation
15
- license: other
16
  license_name: llama3
17
  license_link: LICENSE
18
  inference: false
19
  model_creator: MaziyarPanahi
20
- model_name: Llama-3-8B-Instruct-DPO-v0.1
21
  quantized_by: MaziyarPanahi
22
- datasets:
23
- - mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ---
25
 
26
  <img src="./llama-3-merges.webp" alt="Goku 8x22B v0.1 Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
@@ -109,3 +212,17 @@ outputs = pipeline(
109
  print(outputs[0]["generated_text"][len(prompt):])
110
  ```
111
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: other
5
  library_name: transformers
6
  tags:
7
  - axolotl
 
11
  - pytorch
12
  - llama
13
  - llama-3
14
+ base_model: meta-llama/Meta-Llama-3-8B-Instruct
15
+ datasets:
16
+ - mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha
17
+ model_name: Llama-3-8B-Instruct-DPO-v0.1
18
  pipeline_tag: text-generation
 
19
  license_name: llama3
20
  license_link: LICENSE
21
  inference: false
22
  model_creator: MaziyarPanahi
 
23
  quantized_by: MaziyarPanahi
24
+ model-index:
25
+ - name: Llama-3-8B-Instruct-DPO-v0.1
26
+ results:
27
+ - task:
28
+ type: text-generation
29
+ name: Text Generation
30
+ dataset:
31
+ name: AI2 Reasoning Challenge (25-Shot)
32
+ type: ai2_arc
33
+ config: ARC-Challenge
34
+ split: test
35
+ args:
36
+ num_few_shot: 25
37
+ metrics:
38
+ - type: acc_norm
39
+ value: 61.52
40
+ name: normalized accuracy
41
+ source:
42
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.1
43
+ name: Open LLM Leaderboard
44
+ - task:
45
+ type: text-generation
46
+ name: Text Generation
47
+ dataset:
48
+ name: HellaSwag (10-Shot)
49
+ type: hellaswag
50
+ split: validation
51
+ args:
52
+ num_few_shot: 10
53
+ metrics:
54
+ - type: acc_norm
55
+ value: 79.06
56
+ name: normalized accuracy
57
+ source:
58
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.1
59
+ name: Open LLM Leaderboard
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: MMLU (5-Shot)
65
+ type: cais/mmlu
66
+ config: all
67
+ split: test
68
+ args:
69
+ num_few_shot: 5
70
+ metrics:
71
+ - type: acc
72
+ value: 67.09
73
+ name: accuracy
74
+ source:
75
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.1
76
+ name: Open LLM Leaderboard
77
+ - task:
78
+ type: text-generation
79
+ name: Text Generation
80
+ dataset:
81
+ name: TruthfulQA (0-shot)
82
+ type: truthful_qa
83
+ config: multiple_choice
84
+ split: validation
85
+ args:
86
+ num_few_shot: 0
87
+ metrics:
88
+ - type: mc2
89
+ value: 51.85
90
+ source:
91
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.1
92
+ name: Open LLM Leaderboard
93
+ - task:
94
+ type: text-generation
95
+ name: Text Generation
96
+ dataset:
97
+ name: Winogrande (5-shot)
98
+ type: winogrande
99
+ config: winogrande_xl
100
+ split: validation
101
+ args:
102
+ num_few_shot: 5
103
+ metrics:
104
+ - type: acc
105
+ value: 74.66
106
+ name: accuracy
107
+ source:
108
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.1
109
+ name: Open LLM Leaderboard
110
+ - task:
111
+ type: text-generation
112
+ name: Text Generation
113
+ dataset:
114
+ name: GSM8k (5-shot)
115
+ type: gsm8k
116
+ config: main
117
+ split: test
118
+ args:
119
+ num_few_shot: 5
120
+ metrics:
121
+ - type: acc
122
+ value: 69.29
123
+ name: accuracy
124
+ source:
125
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.1
126
+ name: Open LLM Leaderboard
127
  ---
128
 
129
  <img src="./llama-3-merges.webp" alt="Goku 8x22B v0.1 Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 
212
  print(outputs[0]["generated_text"][len(prompt):])
213
  ```
214
 
215
+
216
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
217
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__Llama-3-8B-Instruct-DPO-v0.1)
218
+
219
+ | Metric |Value|
220
+ |---------------------------------|----:|
221
+ |Avg. |67.25|
222
+ |AI2 Reasoning Challenge (25-Shot)|61.52|
223
+ |HellaSwag (10-Shot) |79.06|
224
+ |MMLU (5-Shot) |67.09|
225
+ |TruthfulQA (0-shot) |51.85|
226
+ |Winogrande (5-shot) |74.66|
227
+ |GSM8k (5-shot) |69.29|
228
+