MaziyarPanahi commited on
Commit
aab4757
1 Parent(s): 0ef6aba

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +124 -8
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
- base_model: meta-llama/Meta-Llama-3-70B-Instruct
 
 
3
  library_name: transformers
4
  tags:
5
  - axolotl
@@ -11,18 +13,119 @@ tags:
11
  - llama
12
  - llama-3
13
  - chatml
14
- language:
15
- - en
 
 
16
  pipeline_tag: text-generation
17
- license: llama3
18
  license_name: llama3
19
  license_link: LICENSE
20
  inference: false
21
  model_creator: MaziyarPanahi
22
- model_name: Llama-3-70B-Instruct-DPO-v0.2
23
  quantized_by: MaziyarPanahi
24
- datasets:
25
- - Intel/orca_dpo_pairs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ---
27
 
28
  <img src="./llama-3-merges.webp" alt="Llama-3 DPO Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
@@ -114,4 +217,17 @@ outputs = pipeline(
114
  top_p=0.95,
115
  )
116
  print(outputs[0]["generated_text"][len(prompt):])
117
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: llama3
5
  library_name: transformers
6
  tags:
7
  - axolotl
 
13
  - llama
14
  - llama-3
15
  - chatml
16
+ base_model: meta-llama/Meta-Llama-3-70B-Instruct
17
+ datasets:
18
+ - Intel/orca_dpo_pairs
19
+ model_name: Llama-3-70B-Instruct-DPO-v0.2
20
  pipeline_tag: text-generation
 
21
  license_name: llama3
22
  license_link: LICENSE
23
  inference: false
24
  model_creator: MaziyarPanahi
 
25
  quantized_by: MaziyarPanahi
26
+ model-index:
27
+ - name: Llama-3-70B-Instruct-DPO-v0.2
28
+ results:
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: AI2 Reasoning Challenge (25-Shot)
34
+ type: ai2_arc
35
+ config: ARC-Challenge
36
+ split: test
37
+ args:
38
+ num_few_shot: 25
39
+ metrics:
40
+ - type: acc_norm
41
+ value: 72.53
42
+ name: normalized accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
45
+ name: Open LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: HellaSwag (10-Shot)
51
+ type: hellaswag
52
+ split: validation
53
+ args:
54
+ num_few_shot: 10
55
+ metrics:
56
+ - type: acc_norm
57
+ value: 86.22
58
+ name: normalized accuracy
59
+ source:
60
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
61
+ name: Open LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: MMLU (5-Shot)
67
+ type: cais/mmlu
68
+ config: all
69
+ split: test
70
+ args:
71
+ num_few_shot: 5
72
+ metrics:
73
+ - type: acc
74
+ value: 80.41
75
+ name: accuracy
76
+ source:
77
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: TruthfulQA (0-shot)
84
+ type: truthful_qa
85
+ config: multiple_choice
86
+ split: validation
87
+ args:
88
+ num_few_shot: 0
89
+ metrics:
90
+ - type: mc2
91
+ value: 63.57
92
+ source:
93
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
94
+ name: Open LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: Winogrande (5-shot)
100
+ type: winogrande
101
+ config: winogrande_xl
102
+ split: validation
103
+ args:
104
+ num_few_shot: 5
105
+ metrics:
106
+ - type: acc
107
+ value: 82.79
108
+ name: accuracy
109
+ source:
110
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
111
+ name: Open LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: GSM8k (5-shot)
117
+ type: gsm8k
118
+ config: main
119
+ split: test
120
+ args:
121
+ num_few_shot: 5
122
+ metrics:
123
+ - type: acc
124
+ value: 88.25
125
+ name: accuracy
126
+ source:
127
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
128
+ name: Open LLM Leaderboard
129
  ---
130
 
131
  <img src="./llama-3-merges.webp" alt="Llama-3 DPO Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 
217
  top_p=0.95,
218
  )
219
  print(outputs[0]["generated_text"][len(prompt):])
220
+ ```
221
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
222
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__Llama-3-70B-Instruct-DPO-v0.2)
223
+
224
+ | Metric |Value|
225
+ |---------------------------------|----:|
226
+ |Avg. |78.96|
227
+ |AI2 Reasoning Challenge (25-Shot)|72.53|
228
+ |HellaSwag (10-Shot) |86.22|
229
+ |MMLU (5-Shot) |80.41|
230
+ |TruthfulQA (0-shot) |63.57|
231
+ |Winogrande (5-shot) |82.79|
232
+ |GSM8k (5-shot) |88.25|
233
+