leaderboard-pr-bot commited on
Commit
e3b222f
·
verified ·
1 Parent(s): d27c49e

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +123 -7
README.md CHANGED
@@ -1,20 +1,122 @@
1
  ---
2
- license: mit
3
- license_link: https://huggingface.co/microsoft/Phi-3-medium-4k-instruct/resolve/main/LICENSE
4
-
5
  language:
6
  - multilingual
7
- pipeline_tag: text-generation
8
  tags:
9
  - nlp
10
  - code
 
 
11
  inference:
12
  parameters:
13
  temperature: 0.7
14
  widget:
15
- - messages:
16
- - role: user
17
- content: Can you provide ways to eat combinations of bananas and dragonfruits?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ---
19
  ## Model Summary
20
 
@@ -268,3 +370,17 @@ The model is licensed under the [MIT license](https://huggingface.co/microsoft/P
268
  ## Trademarks
269
 
270
  This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  language:
3
  - multilingual
4
+ license: mit
5
  tags:
6
  - nlp
7
  - code
8
+ license_link: https://huggingface.co/microsoft/Phi-3-medium-4k-instruct/resolve/main/LICENSE
9
+ pipeline_tag: text-generation
10
  inference:
11
  parameters:
12
  temperature: 0.7
13
  widget:
14
+ - messages:
15
+ - role: user
16
+ content: Can you provide ways to eat combinations of bananas and dragonfruits?
17
+ model-index:
18
+ - name: Phi-3-medium-4k-instruct
19
+ results:
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ name: AI2 Reasoning Challenge (25-Shot)
25
+ type: ai2_arc
26
+ config: ARC-Challenge
27
+ split: test
28
+ args:
29
+ num_few_shot: 25
30
+ metrics:
31
+ - type: acc_norm
32
+ value: 67.32
33
+ name: normalized accuracy
34
+ source:
35
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=microsoft/Phi-3-medium-4k-instruct
36
+ name: Open LLM Leaderboard
37
+ - task:
38
+ type: text-generation
39
+ name: Text Generation
40
+ dataset:
41
+ name: HellaSwag (10-Shot)
42
+ type: hellaswag
43
+ split: validation
44
+ args:
45
+ num_few_shot: 10
46
+ metrics:
47
+ - type: acc_norm
48
+ value: 85.76
49
+ name: normalized accuracy
50
+ source:
51
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=microsoft/Phi-3-medium-4k-instruct
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: MMLU (5-Shot)
58
+ type: cais/mmlu
59
+ config: all
60
+ split: test
61
+ args:
62
+ num_few_shot: 5
63
+ metrics:
64
+ - type: acc
65
+ value: 77.83
66
+ name: accuracy
67
+ source:
68
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=microsoft/Phi-3-medium-4k-instruct
69
+ name: Open LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: TruthfulQA (0-shot)
75
+ type: truthful_qa
76
+ config: multiple_choice
77
+ split: validation
78
+ args:
79
+ num_few_shot: 0
80
+ metrics:
81
+ - type: mc2
82
+ value: 57.71
83
+ source:
84
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=microsoft/Phi-3-medium-4k-instruct
85
+ name: Open LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: Winogrande (5-shot)
91
+ type: winogrande
92
+ config: winogrande_xl
93
+ split: validation
94
+ args:
95
+ num_few_shot: 5
96
+ metrics:
97
+ - type: acc
98
+ value: 72.69
99
+ name: accuracy
100
+ source:
101
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=microsoft/Phi-3-medium-4k-instruct
102
+ name: Open LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: GSM8k (5-shot)
108
+ type: gsm8k
109
+ config: main
110
+ split: test
111
+ args:
112
+ num_few_shot: 5
113
+ metrics:
114
+ - type: acc
115
+ value: 79.38
116
+ name: accuracy
117
+ source:
118
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=microsoft/Phi-3-medium-4k-instruct
119
+ name: Open LLM Leaderboard
120
  ---
121
  ## Model Summary
122
 
 
370
  ## Trademarks
371
 
372
  This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
373
+
374
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
375
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_microsoft__Phi-3-medium-4k-instruct)
376
+
377
+ | Metric |Value|
378
+ |---------------------------------|----:|
379
+ |Avg. |73.45|
380
+ |AI2 Reasoning Challenge (25-Shot)|67.32|
381
+ |HellaSwag (10-Shot) |85.76|
382
+ |MMLU (5-Shot) |77.83|
383
+ |TruthfulQA (0-shot) |57.71|
384
+ |Winogrande (5-shot) |72.69|
385
+ |GSM8k (5-shot) |79.38|
386
+