Text Generation
Transformers
Safetensors
English
mistral
gpt
llm
large language model
h2o-llmstudio
conversational
Inference Endpoints
text-generation-inference
leaderboard-pr-bot commited on
Commit
6b15280
1 Parent(s): e2a1842

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +120 -5
README.md CHANGED
@@ -1,15 +1,13 @@
1
  ---
2
  language:
3
  - en
4
- library_name: transformers
5
  license: apache-2.0
 
6
  tags:
7
  - gpt
8
  - llm
9
  - large language model
10
  - h2o-llmstudio
11
- thumbnail: >-
12
- https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
13
  datasets:
14
  - HuggingFaceH4/ultrafeedback_binarized
15
  - Intel/orca_dpo_pairs
@@ -18,8 +16,112 @@ datasets:
18
  - OpenAssistant/oasst2
19
  - HuggingFaceH4/ultrachat_200k
20
  - meta-math/MetaMathQA
 
21
  widget:
22
- - text: "<|prompt|>Why is drinking water so healthy?</s><|answer|>"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ---
24
  # Model Card
25
  ## Summary
@@ -152,4 +254,17 @@ Please read this disclaimer carefully before using the large language model prov
152
  - Reporting Issues: If you encounter any biased, offensive, or otherwise inappropriate content generated by the large language model, please report it to the repository maintainers through the provided channels. Your feedback will help improve the model and mitigate potential issues.
153
  - Changes to this Disclaimer: The developers of this repository reserve the right to modify or update this disclaimer at any time without prior notice. It is the user's responsibility to periodically review the disclaimer to stay informed about any changes.
154
 
155
- By using the large language model provided in this repository, you agree to accept and comply with the terms and conditions outlined in this disclaimer. If you do not agree with any part of this disclaimer, you should refrain from using the model and any content generated by it.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
 
4
  license: apache-2.0
5
+ library_name: transformers
6
  tags:
7
  - gpt
8
  - llm
9
  - large language model
10
  - h2o-llmstudio
 
 
11
  datasets:
12
  - HuggingFaceH4/ultrafeedback_binarized
13
  - Intel/orca_dpo_pairs
 
16
  - OpenAssistant/oasst2
17
  - HuggingFaceH4/ultrachat_200k
18
  - meta-math/MetaMathQA
19
+ thumbnail: https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
20
  widget:
21
+ - text: <|prompt|>Why is drinking water so healthy?</s><|answer|>
22
+ model-index:
23
+ - name: h2o-danube-1.8b-chat
24
+ results:
25
+ - task:
26
+ type: text-generation
27
+ name: Text Generation
28
+ dataset:
29
+ name: AI2 Reasoning Challenge (25-Shot)
30
+ type: ai2_arc
31
+ config: ARC-Challenge
32
+ split: test
33
+ args:
34
+ num_few_shot: 25
35
+ metrics:
36
+ - type: acc_norm
37
+ value: 41.13
38
+ name: normalized accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=h2oai/h2o-danube-1.8b-chat
41
+ name: Open LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: HellaSwag (10-Shot)
47
+ type: hellaswag
48
+ split: validation
49
+ args:
50
+ num_few_shot: 10
51
+ metrics:
52
+ - type: acc_norm
53
+ value: 68.06
54
+ name: normalized accuracy
55
+ source:
56
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=h2oai/h2o-danube-1.8b-chat
57
+ name: Open LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: MMLU (5-Shot)
63
+ type: cais/mmlu
64
+ config: all
65
+ split: test
66
+ args:
67
+ num_few_shot: 5
68
+ metrics:
69
+ - type: acc
70
+ value: 33.41
71
+ name: accuracy
72
+ source:
73
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=h2oai/h2o-danube-1.8b-chat
74
+ name: Open LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: TruthfulQA (0-shot)
80
+ type: truthful_qa
81
+ config: multiple_choice
82
+ split: validation
83
+ args:
84
+ num_few_shot: 0
85
+ metrics:
86
+ - type: mc2
87
+ value: 41.64
88
+ source:
89
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=h2oai/h2o-danube-1.8b-chat
90
+ name: Open LLM Leaderboard
91
+ - task:
92
+ type: text-generation
93
+ name: Text Generation
94
+ dataset:
95
+ name: Winogrande (5-shot)
96
+ type: winogrande
97
+ config: winogrande_xl
98
+ split: validation
99
+ args:
100
+ num_few_shot: 5
101
+ metrics:
102
+ - type: acc
103
+ value: 65.35
104
+ name: accuracy
105
+ source:
106
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=h2oai/h2o-danube-1.8b-chat
107
+ name: Open LLM Leaderboard
108
+ - task:
109
+ type: text-generation
110
+ name: Text Generation
111
+ dataset:
112
+ name: GSM8k (5-shot)
113
+ type: gsm8k
114
+ config: main
115
+ split: test
116
+ args:
117
+ num_few_shot: 5
118
+ metrics:
119
+ - type: acc
120
+ value: 17.36
121
+ name: accuracy
122
+ source:
123
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=h2oai/h2o-danube-1.8b-chat
124
+ name: Open LLM Leaderboard
125
  ---
126
  # Model Card
127
  ## Summary
 
254
  - Reporting Issues: If you encounter any biased, offensive, or otherwise inappropriate content generated by the large language model, please report it to the repository maintainers through the provided channels. Your feedback will help improve the model and mitigate potential issues.
255
  - Changes to this Disclaimer: The developers of this repository reserve the right to modify or update this disclaimer at any time without prior notice. It is the user's responsibility to periodically review the disclaimer to stay informed about any changes.
256
 
257
+ By using the large language model provided in this repository, you agree to accept and comply with the terms and conditions outlined in this disclaimer. If you do not agree with any part of this disclaimer, you should refrain from using the model and any content generated by it.
258
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
259
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_h2oai__h2o-danube-1.8b-chat)
260
+
261
+ | Metric |Value|
262
+ |---------------------------------|----:|
263
+ |Avg. |44.49|
264
+ |AI2 Reasoning Challenge (25-Shot)|41.13|
265
+ |HellaSwag (10-Shot) |68.06|
266
+ |MMLU (5-Shot) |33.41|
267
+ |TruthfulQA (0-shot) |41.64|
268
+ |Winogrande (5-shot) |65.35|
269
+ |GSM8k (5-shot) |17.36|
270
+