T145 commited on
Commit
fb58257
1 Parent(s): 8e43bc1

Adding Evaluation Results

Browse files

This is an automated PR created with [this space](https://huggingface.co/spaces/T145/open-llm-leaderboard-results-to-modelcard)!

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

Please report any issues here: https://huggingface.co/spaces/T145/open-llm-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +114 -0
README.md CHANGED
@@ -1,5 +1,104 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
  # Model Card for functionary-small-v3.1
5
 
@@ -143,3 +242,18 @@ We encourage users to run our models using our OpenAI-compatible vLLM server [he
143
 
144
  # The MeetKai Team
145
  ![MeetKai Logo](https://huggingface.co/meetkai/functionary-medium-v2.2/resolve/main/meetkai_logo.png "MeetKai Logo")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ model-index:
4
+ - name: functionary-small-v3.1
5
+ results:
6
+ - task:
7
+ type: text-generation
8
+ name: Text Generation
9
+ dataset:
10
+ name: IFEval (0-Shot)
11
+ type: wis-k/instruction-following-eval
12
+ split: train
13
+ args:
14
+ num_few_shot: 0
15
+ metrics:
16
+ - type: inst_level_strict_acc and prompt_level_strict_acc
17
+ value: 62.75
18
+ name: averaged accuracy
19
+ source:
20
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=meetkai%2Ffunctionary-small-v3.1
21
+ name: Open LLM Leaderboard
22
+ - task:
23
+ type: text-generation
24
+ name: Text Generation
25
+ dataset:
26
+ name: BBH (3-Shot)
27
+ type: SaylorTwift/bbh
28
+ split: test
29
+ args:
30
+ num_few_shot: 3
31
+ metrics:
32
+ - type: acc_norm
33
+ value: 28.62
34
+ name: normalized accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=meetkai%2Ffunctionary-small-v3.1
37
+ name: Open LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: MATH Lvl 5 (4-Shot)
43
+ type: lighteval/MATH-Hard
44
+ split: test
45
+ args:
46
+ num_few_shot: 4
47
+ metrics:
48
+ - type: exact_match
49
+ value: 1.06
50
+ name: exact match
51
+ source:
52
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=meetkai%2Ffunctionary-small-v3.1
53
+ name: Open LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: GPQA (0-shot)
59
+ type: Idavidrein/gpqa
60
+ split: train
61
+ args:
62
+ num_few_shot: 0
63
+ metrics:
64
+ - type: acc_norm
65
+ value: 5.15
66
+ name: acc_norm
67
+ source:
68
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=meetkai%2Ffunctionary-small-v3.1
69
+ name: Open LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: MuSR (0-shot)
75
+ type: TAUR-Lab/MuSR
76
+ args:
77
+ num_few_shot: 0
78
+ metrics:
79
+ - type: acc_norm
80
+ value: 6.19
81
+ name: acc_norm
82
+ source:
83
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=meetkai%2Ffunctionary-small-v3.1
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: MMLU-PRO (5-shot)
90
+ type: TIGER-Lab/MMLU-Pro
91
+ config: main
92
+ split: test
93
+ args:
94
+ num_few_shot: 5
95
+ metrics:
96
+ - type: acc
97
+ value: 26.1
98
+ name: accuracy
99
+ source:
100
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=meetkai%2Ffunctionary-small-v3.1
101
+ name: Open LLM Leaderboard
102
  ---
103
  # Model Card for functionary-small-v3.1
104
 
 
242
 
243
  # The MeetKai Team
244
  ![MeetKai Logo](https://huggingface.co/meetkai/functionary-medium-v2.2/resolve/main/meetkai_logo.png "MeetKai Logo")
245
+
246
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
247
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/meetkai__functionary-small-v3.1-details)!
248
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=meetkai%2Ffunctionary-small-v3.1&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
249
+
250
+ | Metric |Value (%)|
251
+ |-------------------|--------:|
252
+ |**Average** | 21.64|
253
+ |IFEval (0-Shot) | 62.75|
254
+ |BBH (3-Shot) | 28.62|
255
+ |MATH Lvl 5 (4-Shot)| 1.06|
256
+ |GPQA (0-shot) | 5.15|
257
+ |MuSR (0-shot) | 6.19|
258
+ |MMLU-PRO (5-shot) | 26.10|
259
+