macadeliccc commited on
Commit
3f9de5b
1 Parent(s): 16169a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -1
README.md CHANGED
@@ -86,4 +86,85 @@ Parseable: 167.0
86
  Batch completed
87
  Time taken: 178.3 mins
88
  ---------------
89
- </pre>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  Batch completed
87
  Time taken: 178.3 mins
88
  ---------------
89
+ </pre>
90
+
91
+ | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
92
+ |---------------------------------------------------------------|------:|------:|---------:|-------:|------:|
93
+ |[OmniCorso-7B](https://huggingface.co/macadeliccc/OmniCorso-7B)| 45.89| 77.66| 74.12| 49.24| 61.73|
94
+
95
+ ### AGIEval
96
+ | Task |Version| Metric |Value| |Stderr|
97
+ |------------------------------|------:|--------|----:|---|-----:|
98
+ |agieval_aqua_rat | 0|acc |29.13|± | 2.86|
99
+ | | |acc_norm|27.17|± | 2.80|
100
+ |agieval_logiqa_en | 0|acc |39.32|± | 1.92|
101
+ | | |acc_norm|39.63|± | 1.92|
102
+ |agieval_lsat_ar | 0|acc |23.91|± | 2.82|
103
+ | | |acc_norm|23.91|± | 2.82|
104
+ |agieval_lsat_lr | 0|acc |53.14|± | 2.21|
105
+ | | |acc_norm|53.92|± | 2.21|
106
+ |agieval_lsat_rc | 0|acc |66.54|± | 2.88|
107
+ | | |acc_norm|67.29|± | 2.87|
108
+ |agieval_sat_en | 0|acc |80.58|± | 2.76|
109
+ | | |acc_norm|80.58|± | 2.76|
110
+ |agieval_sat_en_without_passage| 0|acc |45.63|± | 3.48|
111
+ | | |acc_norm|43.69|± | 3.46|
112
+ |agieval_sat_math | 0|acc |33.18|± | 3.18|
113
+ | | |acc_norm|30.91|± | 3.12|
114
+
115
+ Average: 45.89%
116
+
117
+ ### GPT4All
118
+ | Task |Version| Metric |Value| |Stderr|
119
+ |-------------|------:|--------|----:|---|-----:|
120
+ |arc_challenge| 0|acc |67.32|± | 1.37|
121
+ | | |acc_norm|68.43|± | 1.36|
122
+ |arc_easy | 0|acc |87.46|± | 0.68|
123
+ | | |acc_norm|83.50|± | 0.76|
124
+ |boolq | 1|acc |88.13|± | 0.57|
125
+ |hellaswag | 0|acc |68.47|± | 0.46|
126
+ | | |acc_norm|86.96|± | 0.34|
127
+ |openbookqa | 0|acc |38.80|± | 2.18|
128
+ | | |acc_norm|50.00|± | 2.24|
129
+ |piqa | 0|acc |83.03|± | 0.88|
130
+ | | |acc_norm|85.31|± | 0.83|
131
+ |winogrande | 0|acc |81.29|± | 1.10|
132
+
133
+ Average: 77.66%
134
+
135
+ ### TruthfulQA
136
+ | Task |Version|Metric|Value| |Stderr|
137
+ |-------------|------:|------|----:|---|-----:|
138
+ |truthfulqa_mc| 1|mc1 |58.26|± | 1.73|
139
+ | | |mc2 |74.12|± | 1.43|
140
+
141
+ Average: 74.12%
142
+
143
+ ### Bigbench
144
+ | Task |Version| Metric |Value| |Stderr|
145
+ |------------------------------------------------|------:|---------------------|----:|---|-----:|
146
+ |bigbench_causal_judgement | 0|multiple_choice_grade|56.84|± | 3.60|
147
+ |bigbench_date_understanding | 0|multiple_choice_grade|63.41|± | 2.51|
148
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|49.22|± | 3.12|
149
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|23.96|± | 2.26|
150
+ | | |exact_str_match | 1.39|± | 0.62|
151
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|34.20|± | 2.12|
152
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|23.71|± | 1.61|
153
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|60.33|± | 2.83|
154
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|49.00|± | 2.24|
155
+ |bigbench_navigate | 0|multiple_choice_grade|55.20|± | 1.57|
156
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|70.75|± | 1.02|
157
+ |bigbench_ruin_names | 0|multiple_choice_grade|55.80|± | 2.35|
158
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|36.97|± | 1.53|
159
+ |bigbench_snarks | 0|multiple_choice_grade|72.38|± | 3.33|
160
+ |bigbench_sports_understanding | 0|multiple_choice_grade|76.27|± | 1.36|
161
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|54.50|± | 1.58|
162
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|23.12|± | 1.19|
163
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|20.34|± | 0.96|
164
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|60.33|± | 2.83|
165
+
166
+ Average: 49.24%
167
+
168
+ Average score: 61.73%
169
+
170
+ Elapsed time: 02:20:06