runningSnail
commited on
Commit
•
3796393
1
Parent(s):
969d794
add MMLU results
Browse files
README.md
CHANGED
@@ -95,6 +95,44 @@ print(f'Elapsed time: {end - start:.2f}s')
|
|
95 |
This model was trained on commercially viable data. For use of our model, refer to the [license information](https://www.nexa4ai.com/licenses).
|
96 |
|
97 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
98 |
## References
|
99 |
We thank the Microsoft team for their amazing model!
|
100 |
```
|
|
|
95 |
This model was trained on commercially viable data. For use of our model, refer to the [license information](https://www.nexa4ai.com/licenses).
|
96 |
|
97 |
|
98 |
+
## Performance
|
99 |
+
### Model Selection
|
100 |
+
We leverage the latest Language Large Models for a variety of domains. Below is a summary of the chosen models for each category. In cases where no specialized model exists for a subject, we utilize generic models like Llama3-8b.
|
101 |
+
|
102 |
+
|
103 |
+
| **Model** | **Category** | **Subjects** |
|
104 |
+
|-----------------------------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
105 |
+
| `jondurbin/bagel-8b-v1.0` | Biology | `college_biology`, `high_school_biology` |
|
106 |
+
| `Weyaxi/Einstein-v6.1-Llama3-8B` | Physics | `astronomy`, `college_physics`, `conceptual_physics`, `high_school_physics` |
|
107 |
+
| `meta-llama/Meta-Llama-3-8B-Instruct` | Business | `business_ethics`, `management`, `marketing` |
|
108 |
+
| `meta-llama/Meta-Llama-3-8B-Instruct` | Chemistry | `college_chemistry`, `high_school_chemistry` |
|
109 |
+
| `abacusai/Llama-3-Smaug-8B` | Computer Science | `college_computer_science`, `computer_security`, `high_school_computer_science`, `machine_learning` |
|
110 |
+
| `Open-Orca/Mistral-7B-OpenOrca` | Math | `abstract_algebra`, `college_mathematics`, `elementary_mathematics`, `high_school_mathematics`, `high_school_statistics` |
|
111 |
+
| `meta-llama/Meta-Llama-3-8B-Instruct` | Economics | `econometrics`, `high_school_macroeconomics`, `high_school_microeconomics` |
|
112 |
+
| `AdaptLLM/medicine-chat` | Health | `anatomy`, `clinical_knowledge`, `college_medicine`, `human_aging`, `medical_genetics`, `nutrition`, `professional_medicine`, `virology` |
|
113 |
+
| `STEM-AI-mtl/phi-2-electrical-engineering` | Engineering | `electrical_engineering` |
|
114 |
+
| `meta-llama/Meta-Llama-3-8B-Instruct` | Philosophy | `formal_logic`, `logical_fallacies`, `moral_disputes`, `moral_scenarios`, `philosophy`, `world_religions` |
|
115 |
+
| `microsoft/Phi-3-mini-128k-instruct` | Other | `global_facts`, `miscellaneous`, `professional_accounting` |
|
116 |
+
| `meta-llama/Meta-Llama-3-8B-Instruct` | History | `high_school_european_history`, `high_school_us_history`, `high_school_world_history`, `prehistory` |
|
117 |
+
| `meta-llama/Meta-Llama-3-8B-Instruct` | Culture | `human_sexuality`, `sociology` |
|
118 |
+
| `AdaptLLM/law-chat` | Law | `international_law`, `jurisprudence`, `professional_law` |
|
119 |
+
| `meta-llama/Meta-Llama-3-8B-Instruct` | Psychology | `high_school_psychology`, `professional_psychology` |
|
120 |
+
|
121 |
+
### MMLU Benchmark Results (5-shot learning)
|
122 |
+
Here are the comparative MMLU scores for various models tested under a 5-shot learning setup:
|
123 |
+
|
124 |
+
| **Model** | **MMLU Score** |
|
125 |
+
|-----------------------------------|----------------|
|
126 |
+
| Octopus-V4 | **74.6%** |
|
127 |
+
| GPT-3.5 | 70.0% |
|
128 |
+
| Phi-3-mini-128k-instruct | 68.1% |
|
129 |
+
| OpenELM-3B | 26.7% |
|
130 |
+
| Lamma3-8b-instruct | 68.4% |
|
131 |
+
| Gemma-2b | 42.3% |
|
132 |
+
| Gemma-7b | 64.3% |
|
133 |
+
|
134 |
+
|
135 |
+
|
136 |
## References
|
137 |
We thank the Microsoft team for their amazing model!
|
138 |
```
|