Text Generation
Transformers
Safetensors
mistral
conversational
Inference Endpoints
text-generation-inference
leaderboard-pr-bot commited on
Commit
305a4f9
1 Parent(s): 1293d96

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +117 -1
README.md CHANGED
@@ -43,6 +43,109 @@ datasets:
43
  - WhiteRabbitNeo/WRN-Chapter-1
44
  - WhiteRabbitNeo/WRN-Chapter-2
45
  - winogrande
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  ---
47
 
48
  # A bagel, with everything
@@ -803,4 +906,17 @@ For assistance with the VM join the [Massed Compute Discord Server](https://disc
803
 
804
  - https://bmc.link/jondurbin
805
  - ETH 0xce914eAFC2fe52FdceE59565Dd92c06f776fcb11
806
- - BTC bc1qdwuth4vlg8x37ggntlxu5cjfwgmdy5zaa7pswf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  - WhiteRabbitNeo/WRN-Chapter-1
44
  - WhiteRabbitNeo/WRN-Chapter-2
45
  - winogrande
46
+ model-index:
47
+ - name: bagel-dpo-7b-v0.5
48
+ results:
49
+ - task:
50
+ type: text-generation
51
+ name: Text Generation
52
+ dataset:
53
+ name: AI2 Reasoning Challenge (25-Shot)
54
+ type: ai2_arc
55
+ config: ARC-Challenge
56
+ split: test
57
+ args:
58
+ num_few_shot: 25
59
+ metrics:
60
+ - type: acc_norm
61
+ value: 66.3
62
+ name: normalized accuracy
63
+ source:
64
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-7b-v0.5
65
+ name: Open LLM Leaderboard
66
+ - task:
67
+ type: text-generation
68
+ name: Text Generation
69
+ dataset:
70
+ name: HellaSwag (10-Shot)
71
+ type: hellaswag
72
+ split: validation
73
+ args:
74
+ num_few_shot: 10
75
+ metrics:
76
+ - type: acc_norm
77
+ value: 84.22
78
+ name: normalized accuracy
79
+ source:
80
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-7b-v0.5
81
+ name: Open LLM Leaderboard
82
+ - task:
83
+ type: text-generation
84
+ name: Text Generation
85
+ dataset:
86
+ name: MMLU (5-Shot)
87
+ type: cais/mmlu
88
+ config: all
89
+ split: test
90
+ args:
91
+ num_few_shot: 5
92
+ metrics:
93
+ - type: acc
94
+ value: 65.27
95
+ name: accuracy
96
+ source:
97
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-7b-v0.5
98
+ name: Open LLM Leaderboard
99
+ - task:
100
+ type: text-generation
101
+ name: Text Generation
102
+ dataset:
103
+ name: TruthfulQA (0-shot)
104
+ type: truthful_qa
105
+ config: multiple_choice
106
+ split: validation
107
+ args:
108
+ num_few_shot: 0
109
+ metrics:
110
+ - type: mc2
111
+ value: 62.41
112
+ source:
113
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-7b-v0.5
114
+ name: Open LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: Winogrande (5-shot)
120
+ type: winogrande
121
+ config: winogrande_xl
122
+ split: validation
123
+ args:
124
+ num_few_shot: 5
125
+ metrics:
126
+ - type: acc
127
+ value: 81.45
128
+ name: accuracy
129
+ source:
130
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-7b-v0.5
131
+ name: Open LLM Leaderboard
132
+ - task:
133
+ type: text-generation
134
+ name: Text Generation
135
+ dataset:
136
+ name: GSM8k (5-shot)
137
+ type: gsm8k
138
+ config: main
139
+ split: test
140
+ args:
141
+ num_few_shot: 5
142
+ metrics:
143
+ - type: acc
144
+ value: 53.37
145
+ name: accuracy
146
+ source:
147
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-7b-v0.5
148
+ name: Open LLM Leaderboard
149
  ---
150
 
151
  # A bagel, with everything
 
906
 
907
  - https://bmc.link/jondurbin
908
  - ETH 0xce914eAFC2fe52FdceE59565Dd92c06f776fcb11
909
+ - BTC bc1qdwuth4vlg8x37ggntlxu5cjfwgmdy5zaa7pswf
910
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
911
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jondurbin__bagel-dpo-7b-v0.5)
912
+
913
+ | Metric |Value|
914
+ |---------------------------------|----:|
915
+ |Avg. |68.84|
916
+ |AI2 Reasoning Challenge (25-Shot)|66.30|
917
+ |HellaSwag (10-Shot) |84.22|
918
+ |MMLU (5-Shot) |65.27|
919
+ |TruthfulQA (0-shot) |62.41|
920
+ |Winogrande (5-shot) |81.45|
921
+ |GSM8k (5-shot) |53.37|
922
+