stabilityai
/

stablelm-zephyr-3b

@@ -104,7 +104,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
 | ARC (25-shot)         |  47.0       |
 | HellaSwag (10-shot)   | 74.2    |
 | MMLU (5-shot)        |   46.3     |
-| TruthfulQA (0-shot)   |   46.43 |
 | Winogrande (5-shot)   |   65.5 |
 | GSM8K (5-shot)        | 42.3        |
@@ -112,7 +112,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
 2. BigBench:
 - Average: 35.26
-- Details:
 | Task                                                | Version | Metric                  | Value | Stderr |
 |-----------------------------------------------------|---------|-------------------------|-------|--------|
@@ -138,6 +138,46 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
 | bigbench_tracking_shuffled_objects_seven_objects    | 0       | multiple_choice_grade   | 0.1856| 0.0110 |
 | bigbench_tracking_shuffled_objects_three_objects    | 0       | multiple_choice_grade   | 0.1269| 0.0080 |
 ### Training Infrastructure
 * **Hardware**: `Stable Zephyr 3B` was trained on the Stability AI cluster across 8 nodes with 8 A100 80GBs GPUs for each nodes.

 | ARC (25-shot)         |  47.0       |
 | HellaSwag (10-shot)   | 74.2    |
 | MMLU (5-shot)        |   46.3     |
+| TruthfulQA (0-shot)   |   46.5 |
 | Winogrande (5-shot)   |   65.5 |
 | GSM8K (5-shot)        | 42.3        |
 2. BigBench:
 - Average: 35.26
+- Details:
 | Task                                                | Version | Metric                  | Value | Stderr |
 |-----------------------------------------------------|---------|-------------------------|-------|--------|
 | bigbench_tracking_shuffled_objects_seven_objects    | 0       | multiple_choice_grade   | 0.1856| 0.0110 |
 | bigbench_tracking_shuffled_objects_three_objects    | 0       | multiple_choice_grade   | 0.1269| 0.0080 |
+3. AGI:
+- Average: 33.23
+- Details:
+|             Task             |Version| Metric |Value |   |Stderr|
+|------------------------------|------:|--------|-----:|---|-----:|
+|agieval_aqua_rat              |      0|acc     |0.2126|±  |0.0257|
+|                              |       |acc_norm|0.1890|±  |0.0246|
+|agieval_gaokao_biology        |      0|acc     |0.2571|±  |0.0302|
+|                              |       |acc_norm|0.3143|±  |0.0321|
+|agieval_gaokao_chemistry      |      0|acc     |0.2464|±  |0.0300|
+|                              |       |acc_norm|0.2899|±  |0.0316|
+|agieval_gaokao_chinese        |      0|acc     |0.2927|±  |0.0291|
+|                              |       |acc_norm|0.3049|±  |0.0294|
+|agieval_gaokao_english        |      0|acc     |0.6176|±  |0.0278|
+|                              |       |acc_norm|0.6438|±  |0.0274|
+|agieval_gaokao_geography      |      0|acc     |0.3015|±  |0.0326|
+|                              |       |acc_norm|0.3065|±  |0.0328|
+|agieval_gaokao_history        |      0|acc     |0.3106|±  |0.0303|
+|                              |       |acc_norm|0.3319|±  |0.0308|
+|agieval_gaokao_mathqa         |      0|acc     |0.2650|±  |0.0236|
+|                              |       |acc_norm|0.2707|±  |0.0237|
+|agieval_gaokao_physics        |      0|acc     |0.3450|±  |0.0337|
+|                              |       |acc_norm|0.3550|±  |0.0339|
+|agieval_logiqa_en             |      0|acc     |0.2980|±  |0.0179|
+|                              |       |acc_norm|0.3195|±  |0.0183|
+|agieval_logiqa_zh             |      0|acc     |0.2842|±  |0.0177|
+|                              |       |acc_norm|0.3318|±  |0.0185|
+|agieval_lsat_ar               |      0|acc     |0.2000|±  |0.0264|
+|                              |       |acc_norm|0.2043|±  |0.0266|
+|agieval_lsat_lr               |      0|acc     |0.3176|±  |0.0206|
+|                              |       |acc_norm|0.3275|±  |0.0208|
+|agieval_lsat_rc               |      0|acc     |0.4312|±  |0.0303|
+|                              |       |acc_norm|0.4201|±  |0.0301|
+|agieval_sat_en                |      0|acc     |0.6117|±  |0.0340|
+|                              |       |acc_norm|0.6117|±  |0.0340|
+|agieval_sat_en_without_passage|      0|acc     |0.3398|±  |0.0331|
+|                              |       |acc_norm|0.3495|±  |0.0333|
+|agieval_sat_math              |      0|acc     |0.3182|±  |0.0315|
+|                              |       |acc_norm|0.2909|±  |0.0307|
 ### Training Infrastructure
 * **Hardware**: `Stable Zephyr 3B` was trained on the Stability AI cluster across 8 nodes with 8 A100 80GBs GPUs for each nodes.