leaderboard-pr-bot commited on
Commit
654b11e
1 Parent(s): d65832d

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -21,4 +21,17 @@ This model was much easier than expected to create.
21
 
22
  We used the [ColossalAI](https://www.colossalai.org/) library to fine-tune the [OPT-350M](https://huggingface.co/facebook/opt-350m) model originally trained by Facebook on The Pile. Though our initial dataset was sets of dialogue gathered from various sources totaling about 50 MB in size, early training runs revealed that the model converged after only 7% of the dataset was passed through. To alleviate this, we massively reduced the size of the dataset to only 273 KB.
23
 
24
- ColossalAI's magic allowed for something incredible: this entire model was fine-tuned on a singular GPU with only 6 GB ***(!)*** of VRAM. Fine-tuning took less than an hour to complete.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  We used the [ColossalAI](https://www.colossalai.org/) library to fine-tune the [OPT-350M](https://huggingface.co/facebook/opt-350m) model originally trained by Facebook on The Pile. Though our initial dataset was sets of dialogue gathered from various sources totaling about 50 MB in size, early training runs revealed that the model converged after only 7% of the dataset was passed through. To alleviate this, we massively reduced the size of the dataset to only 273 KB.
23
 
24
+ ColossalAI's magic allowed for something incredible: this entire model was fine-tuned on a singular GPU with only 6 GB ***(!)*** of VRAM. Fine-tuning took less than an hour to complete.
25
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
26
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_PygmalionAI__pygmalion-350m)
27
+
28
+ | Metric | Value |
29
+ |-----------------------|---------------------------|
30
+ | Avg. | 26.23 |
31
+ | ARC (25-shot) | 25.0 |
32
+ | HellaSwag (10-shot) | 37.8 |
33
+ | MMLU (5-shot) | 25.68 |
34
+ | TruthfulQA (0-shot) | 40.41 |
35
+ | Winogrande (5-shot) | 50.28 |
36
+ | GSM8K (5-shot) | 0.53 |
37
+ | DROP (3-shot) | 3.89 |