leaderboard-pr-bot commited on
Commit
d8419df
1 Parent(s): 2fc627e

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +123 -6
README.md CHANGED
@@ -1,9 +1,13 @@
1
  ---
 
 
 
 
 
 
2
  base_model: microsoft/Orca-2-13b
3
  inference: false
4
- license: other
5
  model_creator: Microsoft
6
- model_name: Orca 2 13B
7
  model_type: llama
8
  pipeline_tag: text-generation
9
  prompt_template: '<|im_start|>system
@@ -18,10 +22,109 @@ prompt_template: '<|im_start|>system
18
 
19
  '
20
  quantized_by: TheBloke
21
- tags:
22
- - orca
23
- - orca2
24
- - microsoft
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ---
26
  <!-- markdownlint-disable MD041 -->
27
 
@@ -613,3 +716,17 @@ print(final_output)
613
  primaryClass={cs.AI}
614
  }
615
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: other
3
+ tags:
4
+ - orca
5
+ - orca2
6
+ - microsoft
7
+ model_name: Orca 2 13B
8
  base_model: microsoft/Orca-2-13b
9
  inference: false
 
10
  model_creator: Microsoft
 
11
  model_type: llama
12
  pipeline_tag: text-generation
13
  prompt_template: '<|im_start|>system
 
22
 
23
  '
24
  quantized_by: TheBloke
25
+ model-index:
26
+ - name: Orca-2-13B-GPTQ
27
+ results:
28
+ - task:
29
+ type: text-generation
30
+ name: Text Generation
31
+ dataset:
32
+ name: AI2 Reasoning Challenge (25-Shot)
33
+ type: ai2_arc
34
+ config: ARC-Challenge
35
+ split: test
36
+ args:
37
+ num_few_shot: 25
38
+ metrics:
39
+ - type: acc_norm
40
+ value: 59.81
41
+ name: normalized accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TheBloke/Orca-2-13B-GPTQ
44
+ name: Open LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: HellaSwag (10-Shot)
50
+ type: hellaswag
51
+ split: validation
52
+ args:
53
+ num_few_shot: 10
54
+ metrics:
55
+ - type: acc_norm
56
+ value: 79.12
57
+ name: normalized accuracy
58
+ source:
59
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TheBloke/Orca-2-13B-GPTQ
60
+ name: Open LLM Leaderboard
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: MMLU (5-Shot)
66
+ type: cais/mmlu
67
+ config: all
68
+ split: test
69
+ args:
70
+ num_few_shot: 5
71
+ metrics:
72
+ - type: acc
73
+ value: 59.35
74
+ name: accuracy
75
+ source:
76
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TheBloke/Orca-2-13B-GPTQ
77
+ name: Open LLM Leaderboard
78
+ - task:
79
+ type: text-generation
80
+ name: Text Generation
81
+ dataset:
82
+ name: TruthfulQA (0-shot)
83
+ type: truthful_qa
84
+ config: multiple_choice
85
+ split: validation
86
+ args:
87
+ num_few_shot: 0
88
+ metrics:
89
+ - type: mc2
90
+ value: 55.14
91
+ source:
92
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TheBloke/Orca-2-13B-GPTQ
93
+ name: Open LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: Winogrande (5-shot)
99
+ type: winogrande
100
+ config: winogrande_xl
101
+ split: validation
102
+ args:
103
+ num_few_shot: 5
104
+ metrics:
105
+ - type: acc
106
+ value: 76.64
107
+ name: accuracy
108
+ source:
109
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TheBloke/Orca-2-13B-GPTQ
110
+ name: Open LLM Leaderboard
111
+ - task:
112
+ type: text-generation
113
+ name: Text Generation
114
+ dataset:
115
+ name: GSM8k (5-shot)
116
+ type: gsm8k
117
+ config: main
118
+ split: test
119
+ args:
120
+ num_few_shot: 5
121
+ metrics:
122
+ - type: acc
123
+ value: 15.54
124
+ name: accuracy
125
+ source:
126
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TheBloke/Orca-2-13B-GPTQ
127
+ name: Open LLM Leaderboard
128
  ---
129
  <!-- markdownlint-disable MD041 -->
130
 
 
716
  primaryClass={cs.AI}
717
  }
718
  ```
719
+
720
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
721
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_TheBloke__Orca-2-13B-GPTQ)
722
+
723
+ | Metric |Value|
724
+ |---------------------------------|----:|
725
+ |Avg. |57.60|
726
+ |AI2 Reasoning Challenge (25-Shot)|59.81|
727
+ |HellaSwag (10-Shot) |79.12|
728
+ |MMLU (5-Shot) |59.35|
729
+ |TruthfulQA (0-shot) |55.14|
730
+ |Winogrande (5-shot) |76.64|
731
+ |GSM8k (5-shot) |15.54|
732
+