leaderboard-pr-bot commited on
Commit
a1deec4
1 Parent(s): 3e60ddb

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +137 -22
README.md CHANGED
@@ -1,37 +1,152 @@
1
  ---
2
- pipeline_tag: text-generation
3
- base_model: Locutusque/TinyMistral-248M
4
  license: apache-2.0
5
  datasets:
6
  - Locutusque/InstructMixCleaned
7
  - berkeley-nest/Nectar
8
- language:
9
- - en
10
  widget:
11
- - text: >-
12
- <|USER|> Design a Neo4j database and Cypher function snippet to Display
13
- Extreme Dental hygiene: Using Mouthwash for Analysis for Beginners.
14
- Implement if/else or switch/case statements to handle different conditions
15
- related to the Consent. Provide detailed comments explaining your control
16
- flow and the reasoning behind each decision. <|ASSISTANT|>
17
- - text: >-
18
- <|USER|> Write me a story about a magical place. <|ASSISTANT|>
19
- - text: >-
20
- <|USER|> Write me an essay about the life of George Washington <|ASSISTANT|>
21
- - text: >-
22
- <|USER|> Solve the following equation 2x + 10 = 20 <|ASSISTANT|>
23
- - text: >-
24
- <|USER|> Craft me a list of some nice places to visit around the world. <|ASSISTANT|>
25
- - text: >-
26
- <|USER|> How to manage a lazy employee: Address the employee verbally. Don't allow an employee's laziness or lack of enthusiasm to become a recurring issue. Tell the employee you're hoping to speak with them about workplace expectations and performance, and schedule a time to sit down together. Question: To manage a lazy employee, it is suggested to talk to the employee. True, False, or Neither? <|ASSISTANT|>
27
  inference:
28
  parameters:
29
  temperature: 0.5
30
- do_sample: True
31
  top_p: 0.5
32
  top_k: 30
33
  max_new_tokens: 250
34
  repetition_penalty: 1.15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ---
36
  Base model Locutusque/TinyMistral-248M fully fine-tuned on Locutusque/InstructMix. During validation, this model achieved an average perplexity of 3.23 on Locutusque/InstructMix dataset.
37
- It has so far been trained on approximately 608,000 examples. More epochs are planned for this model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
  datasets:
6
  - Locutusque/InstructMixCleaned
7
  - berkeley-nest/Nectar
8
+ pipeline_tag: text-generation
9
+ base_model: Locutusque/TinyMistral-248M
10
  widget:
11
+ - text: '<|USER|> Design a Neo4j database and Cypher function snippet to Display Extreme
12
+ Dental hygiene: Using Mouthwash for Analysis for Beginners. Implement if/else
13
+ or switch/case statements to handle different conditions related to the Consent.
14
+ Provide detailed comments explaining your control flow and the reasoning behind
15
+ each decision. <|ASSISTANT|> '
16
+ - text: '<|USER|> Write me a story about a magical place. <|ASSISTANT|> '
17
+ - text: '<|USER|> Write me an essay about the life of George Washington <|ASSISTANT|> '
18
+ - text: '<|USER|> Solve the following equation 2x + 10 = 20 <|ASSISTANT|> '
19
+ - text: '<|USER|> Craft me a list of some nice places to visit around the world. <|ASSISTANT|> '
20
+ - text: '<|USER|> How to manage a lazy employee: Address the employee verbally. Don''t
21
+ allow an employee''s laziness or lack of enthusiasm to become a recurring issue.
22
+ Tell the employee you''re hoping to speak with them about workplace expectations
23
+ and performance, and schedule a time to sit down together. Question: To manage
24
+ a lazy employee, it is suggested to talk to the employee. True, False, or Neither?
25
+ <|ASSISTANT|> '
 
26
  inference:
27
  parameters:
28
  temperature: 0.5
29
+ do_sample: true
30
  top_p: 0.5
31
  top_k: 30
32
  max_new_tokens: 250
33
  repetition_penalty: 1.15
34
+ model-index:
35
+ - name: TinyMistral-248M-Instruct
36
+ results:
37
+ - task:
38
+ type: text-generation
39
+ name: Text Generation
40
+ dataset:
41
+ name: AI2 Reasoning Challenge (25-Shot)
42
+ type: ai2_arc
43
+ config: ARC-Challenge
44
+ split: test
45
+ args:
46
+ num_few_shot: 25
47
+ metrics:
48
+ - type: acc_norm
49
+ value: 24.32
50
+ name: normalized accuracy
51
+ source:
52
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-Instruct
53
+ name: Open LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: HellaSwag (10-Shot)
59
+ type: hellaswag
60
+ split: validation
61
+ args:
62
+ num_few_shot: 10
63
+ metrics:
64
+ - type: acc_norm
65
+ value: 27.52
66
+ name: normalized accuracy
67
+ source:
68
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-Instruct
69
+ name: Open LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: MMLU (5-Shot)
75
+ type: cais/mmlu
76
+ config: all
77
+ split: test
78
+ args:
79
+ num_few_shot: 5
80
+ metrics:
81
+ - type: acc
82
+ value: 25.18
83
+ name: accuracy
84
+ source:
85
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-Instruct
86
+ name: Open LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: TruthfulQA (0-shot)
92
+ type: truthful_qa
93
+ config: multiple_choice
94
+ split: validation
95
+ args:
96
+ num_few_shot: 0
97
+ metrics:
98
+ - type: mc2
99
+ value: 41.94
100
+ source:
101
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-Instruct
102
+ name: Open LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: Winogrande (5-shot)
108
+ type: winogrande
109
+ config: winogrande_xl
110
+ split: validation
111
+ args:
112
+ num_few_shot: 5
113
+ metrics:
114
+ - type: acc
115
+ value: 50.2
116
+ name: accuracy
117
+ source:
118
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-Instruct
119
+ name: Open LLM Leaderboard
120
+ - task:
121
+ type: text-generation
122
+ name: Text Generation
123
+ dataset:
124
+ name: GSM8k (5-shot)
125
+ type: gsm8k
126
+ config: main
127
+ split: test
128
+ args:
129
+ num_few_shot: 5
130
+ metrics:
131
+ - type: acc
132
+ value: 0.0
133
+ name: accuracy
134
+ source:
135
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-Instruct
136
+ name: Open LLM Leaderboard
137
  ---
138
  Base model Locutusque/TinyMistral-248M fully fine-tuned on Locutusque/InstructMix. During validation, this model achieved an average perplexity of 3.23 on Locutusque/InstructMix dataset.
139
+ It has so far been trained on approximately 608,000 examples. More epochs are planned for this model.
140
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
141
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Locutusque__TinyMistral-248M-Instruct)
142
+
143
+ | Metric |Value|
144
+ |---------------------------------|----:|
145
+ |Avg. |28.19|
146
+ |AI2 Reasoning Challenge (25-Shot)|24.32|
147
+ |HellaSwag (10-Shot) |27.52|
148
+ |MMLU (5-Shot) |25.18|
149
+ |TruthfulQA (0-shot) |41.94|
150
+ |Winogrande (5-shot) |50.20|
151
+ |GSM8k (5-shot) | 0.00|
152
+