yuvraj17
/

Llama3-8B-Instruct-Slerp

@@ -61,4 +61,72 @@ pipeline = transformers.pipeline(
 outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
 print(outputs[0]["generated_text"])
-```

 outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
 print(outputs[0]["generated_text"])
+```
+# 🏆 Evaluation Scores
+## Nous
+|                                                     Model                                   |AGIEval|TruthfulQA|Bigbench|
+|---------------------------------------------------------------------------------------------|------:|---------:|-------:|
+|[yuvraj17/Llama3-8B-Instruct-Slerp](https://huggingface.co/yuvraj17/Llama3-8B-Instruct-Slerp)|  38.32|     57.15|   43.91|
+### AGIEval
+|             Task             |Version| Metric  | Value |   | Stderr |
+|------------------------------|------:|---------|------:|---|-------:|
+| agieval_aqua_rat              |      0| acc     | 23.62 |±  |  2.67  |
+|                              |       | acc_norm| 22.05 |±  |  2.61  |
+| agieval_logiqa_en             |      0| acc     | 27.50 |±  |  1.75  |
+|                              |       | acc_norm| 31.80 |±  |  1.83  |
+| agieval_lsat_ar               |      0| acc     | 21.30 |±  |  2.71  |
+|                              |       | acc_norm| 20.87 |±  |  2.69  |
+| agieval_lsat_lr               |      0| acc     | 35.29 |±  |  2.12  |
+|                              |       | acc_norm| 37.65 |±  |  2.15  |
+| agieval_lsat_rc               |      0| acc     | 42.01 |±  |  3.01  |
+|                              |       | acc_norm| 39.78 |±  |  2.99  |
+| agieval_sat_en                |      0| acc     | 55.83 |±  |  3.47  |
+|                              |       | acc_norm| 50.49 |±  |  3.49  |
+| agieval_sat_en_without_passage|      0| acc     | 36.89 |±  |  3.37  |
+|                              |       | acc_norm| 34.95 |±  |  3.33  |
+| agieval_sat_math              |      0| acc     | 29.55 |±  |  3.08  |
+|                              |       | acc_norm| 28.64 |±  |  3.05  |
+**Average score**: 33.28%
+### TruthfulQA
+|        Task         |Version| Metric | Value |   | Stderr |
+|---------------------|------:|--------|------:|---|-------:|
+| truthfulqa_mc       |      1| mc1    | 33.54 |±  |  1.65  |
+|                     |       | mc2    | 49.78 |±  |  1.53  |
+**Average score**: 49.78%
+### BigBench
+|                Task                |Version|        Metric         | Value |   | Stderr |
+|------------------------------------|------:|-----------------------|------:|---|-------:|
+| bigbench_causal_judgement          |      0| multiple_choice_grade  | 47.89 |±  |  3.63  |
+| bigbench_date_understanding        |      0| multiple_choice_grade  | 39.02 |±  |  2.54  |
+| bigbench_disambiguation_qa         |      0| multiple_choice_grade  | 33.72 |±  |  2.95  |
+| bigbench_geometric_shapes          |      0| multiple_choice_grade  | 20.61 |±  |  2.14  |
+| bigbench_logical_deduction_five_objects|  0| multiple_choice_grade  | 31.40 |±  |  2.08  |
+| bigbench_logical_deduction_seven_objects| 0| multiple_choice_grade  | 23.71 |±  |  1.61  |
+| bigbench_logical_deduction_three_objects| 0| multiple_choice_grade  | 47.00 |±  |  2.89  |
+| bigbench_movie_recommendation      |      0| multiple_choice_grade  | 27.40 |±  |  1.99  |
+| bigbench_navigate                  |      0| multiple_choice_grade  | 50.10 |±  |  1.58  |
+| bigbench_reasoning_about_colored_objects| 0| multiple_choice_grade  | 38.40 |±  |  1.09  |
+| bigbench_ruin_names                |      0| multiple_choice_grade  | 27.23 |±  |  2.11  |
+| bigbench_salient_translation_error_detection| 0| multiple_choice_grade  | 25.45 |±  |  1.38  |
+| bigbench_snarks                    |      0| multiple_choice_grade  | 46.41 |±  |  3.72  |
+| bigbench_sports_understanding      |      0| multiple_choice_grade  | 50.30 |±  |  1.59  |
+| bigbench_temporal_sequences        |      0| multiple_choice_grade  | 37.30 |±  |  1.53  |
+| bigbench_tracking_shuffled_objects_five_objects| 0| multiple_choice_grade  | 21.36 |±  |  1.16  |
+| bigbench_tracking_shuffled_objects_seven_objects| 0| multiple_choice_grade  | 17.14 |±  |  0.90  |
+| bigbench_tracking_shuffled_objects_three_objects| 0| multiple_choice_grade  | 47.00 |±  |  2.89  |
+**Average score**: 35.38%