Epiculous leaderboard-pr-bot commited on
Commit
c8d101f
1 Parent(s): e2b40c6

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (edcbcd0861f104976f82a876451204262f3acebb)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +120 -12
README.md CHANGED
@@ -1,14 +1,4 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
5
- - anthracite-org/stheno-filtered-v1.1
6
- - PJMixers/hieunguyenminh_roleplay-deduped-ShareGPT
7
- - Gryphe/Sonnet3.5-Charcard-Roleplay
8
- - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
9
- - anthracite-org/kalo-opus-instruct-22k-no-refusal
10
- - anthracite-org/nopm_claude_writing_fixed
11
- - anthracite-org/kalo_opus_misc_240827
12
  language:
13
  - en
14
  - fr
@@ -19,9 +9,114 @@ language:
19
  - ru
20
  - zh
21
  - ja
22
- pipeline_tag: text-generation
23
  tags:
24
  - merge
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ---
26
 
27
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64adfd277b5ff762771e4571/P962FQhRG4I8nbU_DJolY.png)
@@ -71,4 +166,17 @@ parameters:
71
  - value: 0.5 # fallback for rest of tensors
72
  dtype: bfloat16
73
 
74
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
 
 
 
 
 
2
  language:
3
  - en
4
  - fr
 
9
  - ru
10
  - zh
11
  - ja
12
+ license: apache-2.0
13
  tags:
14
  - merge
15
+ datasets:
16
+ - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
17
+ - anthracite-org/stheno-filtered-v1.1
18
+ - PJMixers/hieunguyenminh_roleplay-deduped-ShareGPT
19
+ - Gryphe/Sonnet3.5-Charcard-Roleplay
20
+ - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
21
+ - anthracite-org/kalo-opus-instruct-22k-no-refusal
22
+ - anthracite-org/nopm_claude_writing_fixed
23
+ - anthracite-org/kalo_opus_misc_240827
24
+ pipeline_tag: text-generation
25
+ model-index:
26
+ - name: Violet_Twilight-v0.2
27
+ results:
28
+ - task:
29
+ type: text-generation
30
+ name: Text Generation
31
+ dataset:
32
+ name: IFEval (0-Shot)
33
+ type: HuggingFaceH4/ifeval
34
+ args:
35
+ num_few_shot: 0
36
+ metrics:
37
+ - type: inst_level_strict_acc and prompt_level_strict_acc
38
+ value: 45.32
39
+ name: strict accuracy
40
+ source:
41
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Epiculous/Violet_Twilight-v0.2
42
+ name: Open LLM Leaderboard
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: BBH (3-Shot)
48
+ type: BBH
49
+ args:
50
+ num_few_shot: 3
51
+ metrics:
52
+ - type: acc_norm
53
+ value: 23.94
54
+ name: normalized accuracy
55
+ source:
56
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Epiculous/Violet_Twilight-v0.2
57
+ name: Open LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: MATH Lvl 5 (4-Shot)
63
+ type: hendrycks/competition_math
64
+ args:
65
+ num_few_shot: 4
66
+ metrics:
67
+ - type: exact_match
68
+ value: 2.72
69
+ name: exact match
70
+ source:
71
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Epiculous/Violet_Twilight-v0.2
72
+ name: Open LLM Leaderboard
73
+ - task:
74
+ type: text-generation
75
+ name: Text Generation
76
+ dataset:
77
+ name: GPQA (0-shot)
78
+ type: Idavidrein/gpqa
79
+ args:
80
+ num_few_shot: 0
81
+ metrics:
82
+ - type: acc_norm
83
+ value: 2.13
84
+ name: acc_norm
85
+ source:
86
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Epiculous/Violet_Twilight-v0.2
87
+ name: Open LLM Leaderboard
88
+ - task:
89
+ type: text-generation
90
+ name: Text Generation
91
+ dataset:
92
+ name: MuSR (0-shot)
93
+ type: TAUR-Lab/MuSR
94
+ args:
95
+ num_few_shot: 0
96
+ metrics:
97
+ - type: acc_norm
98
+ value: 13.61
99
+ name: acc_norm
100
+ source:
101
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Epiculous/Violet_Twilight-v0.2
102
+ name: Open LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: MMLU-PRO (5-shot)
108
+ type: TIGER-Lab/MMLU-Pro
109
+ config: main
110
+ split: test
111
+ args:
112
+ num_few_shot: 5
113
+ metrics:
114
+ - type: acc
115
+ value: 23.45
116
+ name: accuracy
117
+ source:
118
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Epiculous/Violet_Twilight-v0.2
119
+ name: Open LLM Leaderboard
120
  ---
121
 
122
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64adfd277b5ff762771e4571/P962FQhRG4I8nbU_DJolY.png)
 
166
  - value: 0.5 # fallback for rest of tensors
167
  dtype: bfloat16
168
 
169
+ ```
170
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
171
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Epiculous__Violet_Twilight-v0.2)
172
+
173
+ | Metric |Value|
174
+ |-------------------|----:|
175
+ |Avg. |18.53|
176
+ |IFEval (0-Shot) |45.32|
177
+ |BBH (3-Shot) |23.94|
178
+ |MATH Lvl 5 (4-Shot)| 2.72|
179
+ |GPQA (0-shot) | 2.13|
180
+ |MuSR (0-shot) |13.61|
181
+ |MMLU-PRO (5-shot) |23.45|
182
+