Commit
4ed32d7
1 Parent(s): 54a4179

Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (2aa518a5b64744899c1c0dc05f9a9d02ace957be)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +117 -1
README.md CHANGED
@@ -1,5 +1,108 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
  ## Prerequisites
5
  In addition to pytorch and transformers, install required packages:
@@ -43,4 +146,17 @@ print(response)
43
  mediocredev/open-llama-3b-v2-instruct is based on LLaMA 3B v2. It can struggle with factual accuracy, particularly when presented with conflicting information or nuanced topics. Its outputs are not deterministic and require critical evaluation to avoid relying solely on its assertions. Additionally, its generative capabilities, while promising, can sometimes produce factually incorrect or offensive content, necessitating careful curation and human oversight. As an evolving model, LLaMA is still under development, and its limitations in areas like bias mitigation and interpretability are being actively addressed. By using this model responsibly and being aware of its shortcomings, we can unlock its potential while mitigating its risks.
44
 
45
  ## Contact
46
- Welcome any feedback, questions, and discussions. Feel free to reach out: mediocredev@outlook.com
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ model-index:
4
+ - name: open-llama-3b-v2-instruct
5
+ results:
6
+ - task:
7
+ type: text-generation
8
+ name: Text Generation
9
+ dataset:
10
+ name: AI2 Reasoning Challenge (25-Shot)
11
+ type: ai2_arc
12
+ config: ARC-Challenge
13
+ split: test
14
+ args:
15
+ num_few_shot: 25
16
+ metrics:
17
+ - type: acc_norm
18
+ value: 38.48
19
+ name: normalized accuracy
20
+ source:
21
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mediocredev/open-llama-3b-v2-instruct
22
+ name: Open LLM Leaderboard
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: HellaSwag (10-Shot)
28
+ type: hellaswag
29
+ split: validation
30
+ args:
31
+ num_few_shot: 10
32
+ metrics:
33
+ - type: acc_norm
34
+ value: 70.24
35
+ name: normalized accuracy
36
+ source:
37
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mediocredev/open-llama-3b-v2-instruct
38
+ name: Open LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: MMLU (5-Shot)
44
+ type: cais/mmlu
45
+ config: all
46
+ split: test
47
+ args:
48
+ num_few_shot: 5
49
+ metrics:
50
+ - type: acc
51
+ value: 39.69
52
+ name: accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mediocredev/open-llama-3b-v2-instruct
55
+ name: Open LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: TruthfulQA (0-shot)
61
+ type: truthful_qa
62
+ config: multiple_choice
63
+ split: validation
64
+ args:
65
+ num_few_shot: 0
66
+ metrics:
67
+ - type: mc2
68
+ value: 37.96
69
+ source:
70
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mediocredev/open-llama-3b-v2-instruct
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: Winogrande (5-shot)
77
+ type: winogrande
78
+ config: winogrande_xl
79
+ split: validation
80
+ args:
81
+ num_few_shot: 5
82
+ metrics:
83
+ - type: acc
84
+ value: 65.75
85
+ name: accuracy
86
+ source:
87
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mediocredev/open-llama-3b-v2-instruct
88
+ name: Open LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: GSM8k (5-shot)
94
+ type: gsm8k
95
+ config: main
96
+ split: test
97
+ args:
98
+ num_few_shot: 5
99
+ metrics:
100
+ - type: acc
101
+ value: 0.0
102
+ name: accuracy
103
+ source:
104
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mediocredev/open-llama-3b-v2-instruct
105
+ name: Open LLM Leaderboard
106
  ---
107
  ## Prerequisites
108
  In addition to pytorch and transformers, install required packages:
 
146
  mediocredev/open-llama-3b-v2-instruct is based on LLaMA 3B v2. It can struggle with factual accuracy, particularly when presented with conflicting information or nuanced topics. Its outputs are not deterministic and require critical evaluation to avoid relying solely on its assertions. Additionally, its generative capabilities, while promising, can sometimes produce factually incorrect or offensive content, necessitating careful curation and human oversight. As an evolving model, LLaMA is still under development, and its limitations in areas like bias mitigation and interpretability are being actively addressed. By using this model responsibly and being aware of its shortcomings, we can unlock its potential while mitigating its risks.
147
 
148
  ## Contact
149
+ Welcome any feedback, questions, and discussions. Feel free to reach out: mediocredev@outlook.com
150
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
151
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mediocredev__open-llama-3b-v2-instruct)
152
+
153
+ | Metric |Value|
154
+ |---------------------------------|----:|
155
+ |Avg. |42.02|
156
+ |AI2 Reasoning Challenge (25-Shot)|38.48|
157
+ |HellaSwag (10-Shot) |70.24|
158
+ |MMLU (5-Shot) |39.69|
159
+ |TruthfulQA (0-shot) |37.96|
160
+ |Winogrande (5-shot) |65.75|
161
+ |GSM8k (5-shot) | 0.00|
162
+