Commit
9704faf
1 Parent(s): 888a0d6

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (053d64b0bdc25fdbcc74c8c11070a27baccd97fb)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +129 -19
README.md CHANGED
@@ -1,7 +1,8 @@
1
  ---
2
- library_name: transformers
3
  language:
4
  - en
 
 
5
  inference:
6
  parameters:
7
  max_new_tokens: 64
@@ -16,40 +17,136 @@ widget:
16
  example_title: El Microondas
17
  - text: Kennesaw State University is a public
18
  example_title: Kennesaw State University
19
- - text: >-
20
- Bungie Studios is an American video game developer. They are most famous for
21
- developing the award winning Halo series of video games. They also made
22
- Destiny. The studio was founded
23
  example_title: Bungie
24
  - text: The Mona Lisa is a world-renowned painting created by
25
  example_title: Mona Lisa
26
- - text: >-
27
- The Harry Potter series, written by J.K. Rowling, begins with the book
28
- titled
29
  example_title: Harry Potter Series
30
- - text: >-
31
- Question: I have cities, but no houses. I have mountains, but no trees. I
32
  have water, but no fish. What am I?
33
 
34
- Answer:
35
  example_title: Riddle
36
  - text: The process of photosynthesis involves the conversion of
37
  example_title: Photosynthesis
38
- - text: >-
39
- Jane went to the store to buy some groceries. She picked up apples, oranges,
40
  and a loaf of bread. When she got home, she realized she forgot
41
  example_title: Story Continuation
42
- - text: >-
43
- Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
44
- another train leaves Station B at 10:00 AM and travels at 80 mph, when will
45
  they meet if the distance between the stations is 300 miles?
46
 
47
- To determine
48
  example_title: Math Problem
49
  - text: In the context of computer programming, an algorithm is
50
  example_title: Algorithm Definition
51
  pipeline_tag: text-generation
52
- license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  ---
54
 
55
 
@@ -460,4 +557,17 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
460
 
461
  ## Model Card Contact
462
 
463
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - en
4
+ license: mit
5
+ library_name: transformers
6
  inference:
7
  parameters:
8
  max_new_tokens: 64
 
17
  example_title: El Microondas
18
  - text: Kennesaw State University is a public
19
  example_title: Kennesaw State University
20
+ - text: Bungie Studios is an American video game developer. They are most famous for
21
+ developing the award winning Halo series of video games. They also made Destiny.
22
+ The studio was founded
 
23
  example_title: Bungie
24
  - text: The Mona Lisa is a world-renowned painting created by
25
  example_title: Mona Lisa
26
+ - text: The Harry Potter series, written by J.K. Rowling, begins with the book titled
 
 
27
  example_title: Harry Potter Series
28
+ - text: 'Question: I have cities, but no houses. I have mountains, but no trees. I
 
29
  have water, but no fish. What am I?
30
 
31
+ Answer:'
32
  example_title: Riddle
33
  - text: The process of photosynthesis involves the conversion of
34
  example_title: Photosynthesis
35
+ - text: Jane went to the store to buy some groceries. She picked up apples, oranges,
 
36
  and a loaf of bread. When she got home, she realized she forgot
37
  example_title: Story Continuation
38
+ - text: 'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
39
+ and another train leaves Station B at 10:00 AM and travels at 80 mph, when will
 
40
  they meet if the distance between the stations is 300 miles?
41
 
42
+ To determine'
43
  example_title: Math Problem
44
  - text: In the context of computer programming, an algorithm is
45
  example_title: Algorithm Definition
46
  pipeline_tag: text-generation
47
+ model-index:
48
+ - name: nano-phi-115M-v0.1
49
+ results:
50
+ - task:
51
+ type: text-generation
52
+ name: Text Generation
53
+ dataset:
54
+ name: AI2 Reasoning Challenge (25-Shot)
55
+ type: ai2_arc
56
+ config: ARC-Challenge
57
+ split: test
58
+ args:
59
+ num_few_shot: 25
60
+ metrics:
61
+ - type: acc_norm
62
+ value: 21.93
63
+ name: normalized accuracy
64
+ source:
65
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
66
+ name: Open LLM Leaderboard
67
+ - task:
68
+ type: text-generation
69
+ name: Text Generation
70
+ dataset:
71
+ name: HellaSwag (10-Shot)
72
+ type: hellaswag
73
+ split: validation
74
+ args:
75
+ num_few_shot: 10
76
+ metrics:
77
+ - type: acc_norm
78
+ value: 27.86
79
+ name: normalized accuracy
80
+ source:
81
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
82
+ name: Open LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: MMLU (5-Shot)
88
+ type: cais/mmlu
89
+ config: all
90
+ split: test
91
+ args:
92
+ num_few_shot: 5
93
+ metrics:
94
+ - type: acc
95
+ value: 25.34
96
+ name: accuracy
97
+ source:
98
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
99
+ name: Open LLM Leaderboard
100
+ - task:
101
+ type: text-generation
102
+ name: Text Generation
103
+ dataset:
104
+ name: TruthfulQA (0-shot)
105
+ type: truthful_qa
106
+ config: multiple_choice
107
+ split: validation
108
+ args:
109
+ num_few_shot: 0
110
+ metrics:
111
+ - type: mc2
112
+ value: 46.0
113
+ source:
114
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
115
+ name: Open LLM Leaderboard
116
+ - task:
117
+ type: text-generation
118
+ name: Text Generation
119
+ dataset:
120
+ name: Winogrande (5-shot)
121
+ type: winogrande
122
+ config: winogrande_xl
123
+ split: validation
124
+ args:
125
+ num_few_shot: 5
126
+ metrics:
127
+ - type: acc
128
+ value: 50.83
129
+ name: accuracy
130
+ source:
131
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
132
+ name: Open LLM Leaderboard
133
+ - task:
134
+ type: text-generation
135
+ name: Text Generation
136
+ dataset:
137
+ name: GSM8k (5-shot)
138
+ type: gsm8k
139
+ config: main
140
+ split: test
141
+ args:
142
+ num_few_shot: 5
143
+ metrics:
144
+ - type: acc
145
+ value: 0.0
146
+ name: accuracy
147
+ source:
148
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
149
+ name: Open LLM Leaderboard
150
  ---
151
 
152
 
 
557
 
558
  ## Model Card Contact
559
 
560
+ [More Information Needed]
561
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
562
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_kenhktsui__nano-phi-115M-v0.1)
563
+
564
+ | Metric |Value|
565
+ |---------------------------------|----:|
566
+ |Avg. |28.66|
567
+ |AI2 Reasoning Challenge (25-Shot)|21.93|
568
+ |HellaSwag (10-Shot) |27.86|
569
+ |MMLU (5-Shot) |25.34|
570
+ |TruthfulQA (0-shot) |46.00|
571
+ |Winogrande (5-shot) |50.83|
572
+ |GSM8k (5-shot) | 0.00|
573
+