Text Generation
Transformers
Safetensors
English
stablelm
causal-lm
conversational
Eval Results
Inference Endpoints
pvduy commited on
Commit
3131f94
1 Parent(s): 0cd93b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -99,7 +99,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
99
  | GPT-4 | -| RLHF |8.99| 95.28|
100
 
101
  ## Other benchmark:
102
- 1. HuggingFace OpenLLM Leaderboard
103
  | Metric | Value |
104
  |-----------------------|---------------------------|
105
  | ARC (25-shot) | 47.0 |
@@ -110,7 +110,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
110
  | GSM8K (5-shot) | 42.3 |
111
 
112
 
113
- 2. BigBench:
114
 
115
  - Average: 35.26
116
  - Details:
@@ -139,7 +139,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
139
  | bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 0.1856| 0.0110 |
140
  | bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 0.1269| 0.0080 |
141
 
142
- 3. AGI:
143
  - Average: 33.23
144
  - Details:
145
  | Task |Version| Metric |Value | |Stderr|
 
99
  | GPT-4 | -| RLHF |8.99| 95.28|
100
 
101
  ## Other benchmark:
102
+ 1. **HuggingFace OpenLLM Leaderboard**
103
  | Metric | Value |
104
  |-----------------------|---------------------------|
105
  | ARC (25-shot) | 47.0 |
 
110
  | GSM8K (5-shot) | 42.3 |
111
 
112
 
113
+ 2. **BigBench**:
114
 
115
  - Average: 35.26
116
  - Details:
 
139
  | bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 0.1856| 0.0110 |
140
  | bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 0.1269| 0.0080 |
141
 
142
+ 3. **AGI Benchmark**:
143
  - Average: 33.23
144
  - Details:
145
  | Task |Version| Metric |Value | |Stderr|