abideen commited on
Commit
6c2bc3b
1 Parent(s): ded4679

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -12
README.md CHANGED
@@ -28,24 +28,106 @@ should probably proofread and complete it, then remove this comment. -->
28
 
29
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/62S_ExHO6NKCM3NhPDrds.jpeg)
30
 
31
- AlphaMonarch-laser is a DPO fine-tuned of [mlabonne/NeuralMonarch-7B](https://huggingface.co/mlabonne/NeuralMonarch-7B/) using the [argilla/OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/argilla/OpenHermes2.5-dpo-binarized-alpha) preference dataset but achieves better performance then [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B/) using
32
 
 
 
33
 
34
- This model is a fine-tuned version of [mlabonne/NeuralMonarch-7B](https://huggingface.co/mlabonne/NeuralMonarch-7B) on
35
 
36
- ## Model description
37
 
38
- More information needed
39
 
40
- ## Intended uses & limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
- More information needed
43
 
44
- ## Training and evaluation data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
- More information needed
47
 
48
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ### Training hyperparameters
51
 
@@ -63,9 +145,6 @@ The following hyperparameters were used during training:
63
 
64
 
65
 
66
-
67
-
68
-
69
  ### 📝 Axolotl Configuration
70
 
71
  ```yaml
 
28
 
29
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/62S_ExHO6NKCM3NhPDrds.jpeg)
30
 
31
+ AlphaMonarch-laser is a DPO fine-tuned of [mlabonne/NeuralMonarch-7B](https://huggingface.co/mlabonne/NeuralMonarch-7B/) using the [argilla/OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/argilla/OpenHermes2.5-dpo-binarized-alpha) preference dataset but achieves better performance then [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B/) using LaserQLoRA. We have fine-tuned this model only on half of the projections, but have achieved better results as compared to the version released by Maximme Labonne. We have trained this model for 1080 steps.
32
 
33
+ AlphaMonarch-laser is ranking 1 on YALL - [Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
34
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/Jgxw1FZRx7nNAdSh7nYt1.png)
35
 
36
+ ## 🏆 Evaluation results
37
 
38
+ # Nous Benchmark
39
 
40
+ ### AGIEVAL
41
 
42
+ | Task | Version | Metric | Value | StdErr |
43
+ |---------------------------------|---------|--------------|--------|--------|
44
+ | agieval_aqua_rat | 0 | acc | 28.35% | 2.83% |
45
+ | agieval_aqua_rat | 0 | acc_norm | 26.38% | 2.77% |
46
+ | agieval_logiqa_en | 0 | acc | 38.25% | 1.91% |
47
+ | agieval_logiqa_en | 0 | acc_norm | 38.10% | 1.90% |
48
+ | agieval_lsat_ar | 0 | acc | 23.91% | 2.82% |
49
+ | agieval_lsat_ar | 0 | acc_norm | 23.48% | 2.80% |
50
+ | agieval_lsat_lr | 0 | acc | 52.75% | 2.21% |
51
+ | agieval_lsat_lr | 0 | acc_norm | 53.92% | 2.21% |
52
+ | agieval_lsat_rc | 0 | acc | 66.91% | 2.87% |
53
+ | agieval_lsat_rc | 0 | acc_norm | 67.29% | 2.87% |
54
+ | agieval_sat_en | 0 | acc | 78.64% | 2.86% |
55
+ | agieval_sat_en | 0 | acc_norm | 78.64% | 2.86% |
56
+ | agieval_sat_en_without_passage | 0 | acc | 45.15% | 3.48% |
57
+ | agieval_sat_en_without_passage | 0 | acc_norm | 44.17% | 3.47% |
58
+ | agieval_sat_math | 0 | acc | 33.18% | 3.18% |
59
+ | agieval_sat_math | 0 | acc_norm | 31.36% | 3.14% |
60
+ Average: 28.41%
61
 
62
+ ### GPT4ALL
63
 
64
+ | Task | Version | Metric | Value | StdErr |
65
+ |--------------|---------|----------|-------|--------|
66
+ | arc_challenge| 0 | acc | 66.30%| ± 1.38%|
67
+ | | | acc_norm | 68.26%| ± 1.36%|
68
+ | arc_easy | 0 | acc | 86.57%| ± 0.70%|
69
+ | | | acc_norm | 80.81%| ± 0.81%|
70
+ | boolq | 1 | acc | 87.16%| ± 0.59%|
71
+ | hellaswag | 0 | acc | 69.60%| ± 0.46%|
72
+ | | | acc_norm | 87.45%| ± 0.33%|
73
+ | openbookqa | 0 | acc | 39.20%| ± 2.19%|
74
+ | | | acc_norm | 49.60%| ± 2.24%|
75
+ | piqa | 0 | acc | 83.03%| ± 0.88%|
76
+ | | | acc_norm | 84.87%| ± 0.84%|
77
+ | winogrande | 0 | acc | 81.06%| ± 1.10%|
78
+ Average: 76.98%
79
 
80
+ ### TRUTHFUL-QA
81
 
82
+ | Task | Version | Metric | Value | StdErr |
83
+ |---------------|---------|--------|-------|--------|
84
+ | truthfulqa_mc | 1 | mc1 | 63.04%| ± 1.69%|
85
+ | truthfulqa_mc | 1 | mc2 | 78.39%| ± 1.37%|
86
+ Average: 70.71%
87
+
88
+ ### BIGBENCH
89
+
90
+ | Task | Version | Metric | Value | StdErr |
91
+ |------------------------------------------------|---------|-----------------------|-------|--------------------|
92
+ | bigbench_causal_judgement | 0 | multiple_choice_grade| 60.00%| ± 3.56% |
93
+ | bigbench_date_understanding | 0 | multiple_choice_grade| 62.06%| ± 2.53% |
94
+ | bigbench_disambiguation_qa | 0 | multiple_choice_grade| 54.26%| ± 3.11% |
95
+ | bigbench_geometric_shapes | 0 | multiple_choice_grade| 23.96%| ± 2.26% |
96
+ | | | exact_str_match | 0.00% | ± 0.00% |
97
+ | bigbench_logical_deduction_five_objects | 0 | multiple_choice_grade| 32.80%| ± 2.10% |
98
+ | bigbench_logical_deduction_seven_objects | 0 | multiple_choice_grade| 23.86%| ± 1.61% |
99
+ | bigbench_logical_deduction_three_objects | 0 | multiple_choice_grade| 59.33%| ± 2.84% |
100
+ | bigbench_movie_recommendation | 0 | multiple_choice_grade| 58.00%| ± 2.21% |
101
+ | bigbench_navigate | 0 | multiple_choice_grade| 56.00%| ± 1.57% |
102
+ | bigbench_reasoning_about_colored_objects | 0 | multiple_choice_grade| 69.20%| ± 1.03% |
103
+ | bigbench_ruin_names | 0 | multiple_choice_grade| 55.36%| ± 2.35% |
104
+ | bigbench_salient_translation_error_detection | 0 | multiple_choice_grade| 41.48%| ± 1.56% |
105
+ | bigbench_snarks | 0 | multiple_choice_grade| 73.48%| ± 3.29% |
106
+ | bigbench_sports_understanding | 0 | multiple_choice_grade| 76.06%| ± 1.36% |
107
+ | bigbench_temporal_sequences | 0 | multiple_choice_grade| 55.50%| ± 1.57% |
108
+ | bigbench_tracking_shuffled_objects_five_objects| 0 | multiple_choice_grade| 23.28%| ± 1.20% |
109
+ | bigbench_tracking_shuffled_objects_seven_objects| 0 | multiple_choice_grade| 19.37%| ± 0.94% |
110
+ | bigbench_tracking_shuffled_objects_three_objects| 0 | multiple_choice_grade| 59.33%| ± 2.84% |
111
+ Average: 55.37%
112
+
113
+ # Openllm Benchmark
114
+
115
+ | Task |Version| Metric |Value| |Stderr|
116
+ |-------------|------:|--------|----:|---|-----:|
117
+ |arc_challenge| 0|acc |70.12|± | 1.30|
118
+ | | |acc_norm|73.27|± | 1.29|
119
+ |hellaswag | 0|acc |71.80|± | 0.44|
120
+ | | |acc_norm|89.20|± | 0.30|
121
+ |gsm8k | 0|acc |66.77|± | 1.2 |
122
+ |winogrande | 0|acc |84.6 |± | 1.0 |
123
+
124
+ Average: 73.5%
125
+
126
+ ### TruthfulQA
127
+ | Task |Version|Metric|Value| |Stderr|
128
+ |-------------|------:|------|----:|---|-----:|
129
+ |truthfulqa_mc| 1|mc1 |62.79|± | 1.69|
130
+ | | |mc2 |77.90|± | 1.37|
131
 
132
  ### Training hyperparameters
133
 
 
145
 
146
 
147
 
 
 
 
148
  ### 📝 Axolotl Configuration
149
 
150
  ```yaml