metadata
license: apache-2.0
datasets:
- BAAI/Infinity-Instruct
tags:
- axolotl
- NousResearch/Hermes-2-Pro-Mistral-7B
- finetune
- gguf
Hermes 2 Pro Mistral-7B Infinity-Instruct GGUF
This model is a fine-tuned version of NousResearch/Hermes-2-Pro-Mistral-7B on the BAAI/Infinity-Instruct dataset. You can find the main model page here.
Model Details
- Base Model: NousResearch/Hermes-2-Pro-Mistral-7B
- Dataset: BAAI/Infinity-Instruct
- Sequence Length: 8192 tokens
- Training:
- Epochs: 1
- Hardware: 4 Nodes x 4 NVIDIA A100 40GB GPUs
- Duration: 26:56:43
- Cluster: KIT SCC Cluster
Benchmark n_shots=0
Benchmark | Score |
---|---|
ARC (Challenge) | 52.47% |
ARC (Easy) | 81.65% |
BoolQ | 87.22% |
HellaSwag | 60.52% |
OpenBookQA | 33.60% |
PIQA | 81.12% |
Winogrande | 72.22% |
AGIEval | 38.46% |
TruthfulQA | 44.22% |
MMLU | 59.72% |
IFEval | 47.96% |
For detailed benchmark results, including sub-categories and various metrics, please refer to the full benchmark table at the end of this README.
License
This model is released under the Apache 2.0 license.
chatml
<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
Knock Knock, who is there?<|im_end|>
<|im_start|>assistant
Hi there! <|im_end|>
Acknowledgements
Special thanks to:
- NousResearch for their excellent base model
- BAAI for providing the Infinity-Instruct dataset
- KIT SCC for FLOPS
Citation
If you use this model in your research, consider citing. Although definetly cite NousResearch and BAAI:
@misc{hermes2pro-mistral-7b-infinity,
author = {juvi21},
title = {Hermes 2 Pro Mistral-7B Infinity-Instruct},
year = {2024},
}
full-benchmark-results
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
agieval | N/A | none | 0 | acc | ↑ | 0.3846 | ± | 0.0051 |
none | 0 | acc_norm | ↑ | 0.4186 | ± | 0.0056 | ||
- agieval_aqua_rat | 1 | none | 0 | acc | ↑ | 0.2520 | ± | 0.0273 |
none | 0 | acc_norm | ↑ | 0.2323 | ± | 0.0265 | ||
- agieval_gaokao_biology | 1 | none | 0 | acc | ↑ | 0.2952 | ± | 0.0316 |
none | 0 | acc_norm | ↑ | 0.3381 | ± | 0.0327 | ||
- agieval_gaokao_chemistry | 1 | none | 0 | acc | ↑ | 0.2560 | ± | 0.0304 |
none | 0 | acc_norm | ↑ | 0.2850 | ± | 0.0315 | ||
- agieval_gaokao_chinese | 1 | none | 0 | acc | ↑ | 0.2317 | ± | 0.0270 |
none | 0 | acc_norm | ↑ | 0.2236 | ± | 0.0266 | ||
- agieval_gaokao_english | 1 | none | 0 | acc | ↑ | 0.6667 | ± | 0.0270 |
none | 0 | acc_norm | ↑ | 0.6863 | ± | 0.0266 | ||
- agieval_gaokao_geography | 1 | none | 0 | acc | ↑ | 0.3869 | ± | 0.0346 |
none | 0 | acc_norm | ↑ | 0.4020 | ± | 0.0348 | ||
- agieval_gaokao_history | 1 | none | 0 | acc | ↑ | 0.4468 | ± | 0.0325 |
none | 0 | acc_norm | ↑ | 0.3957 | ± | 0.0320 | ||
- agieval_gaokao_mathcloze | 1 | none | 0 | acc | ↑ | 0.0254 | ± | 0.0146 |
- agieval_gaokao_mathqa | 1 | none | 0 | acc | ↑ | 0.2507 | ± | 0.0232 |
none | 0 | acc_norm | ↑ | 0.2621 | ± | 0.0235 | ||
- agieval_gaokao_physics | 1 | none | 0 | acc | ↑ | 0.2900 | ± | 0.0322 |
none | 0 | acc_norm | ↑ | 0.3100 | ± | 0.0328 | ||
- agieval_jec_qa_ca | 1 | none | 0 | acc | ↑ | 0.4735 | ± | 0.0158 |
none | 0 | acc_norm | ↑ | 0.4695 | ± | 0.0158 | ||
- agieval_jec_qa_kd | 1 | none | 0 | acc | ↑ | 0.5290 | ± | 0.0158 |
none | 0 | acc_norm | ↑ | 0.5140 | ± | 0.0158 | ||
- agieval_logiqa_en | 1 | none | 0 | acc | ↑ | 0.3579 | ± | 0.0188 |
none | 0 | acc_norm | ↑ | 0.3779 | ± | 0.0190 | ||
- agieval_logiqa_zh | 1 | none | 0 | acc | ↑ | 0.3103 | ± | 0.0181 |
none | 0 | acc_norm | ↑ | 0.3318 | ± | 0.0185 | ||
- agieval_lsat_ar | 1 | none | 0 | acc | ↑ | 0.2217 | ± | 0.0275 |
none | 0 | acc_norm | ↑ | 0.2217 | ± | 0.0275 | ||
- agieval_lsat_lr | 1 | none | 0 | acc | ↑ | 0.5333 | ± | 0.0221 |
none | 0 | acc_norm | ↑ | 0.5098 | ± | 0.0222 | ||
- agieval_lsat_rc | 1 | none | 0 | acc | ↑ | 0.5948 | ± | 0.0300 |
none | 0 | acc_norm | ↑ | 0.5353 | ± | 0.0305 | ||
- agieval_math | 1 | none | 0 | acc | ↑ | 0.1520 | ± | 0.0114 |
- agieval_sat_en | 1 | none | 0 | acc | ↑ | 0.7864 | ± | 0.0286 |
none | 0 | acc_norm | ↑ | 0.7621 | ± | 0.0297 | ||
- agieval_sat_en_without_passage | 1 | none | 0 | acc | ↑ | 0.4660 | ± | 0.0348 |
none | 0 | acc_norm | ↑ | 0.4272 | ± | 0.0345 | ||
- agieval_sat_math | 1 | none | 0 | acc | ↑ | 0.3591 | ± | 0.0324 |
none | 0 | acc_norm | ↑ | 0.3045 | ± | 0.0311 | ||
arc_challenge | 1 | none | 0 | acc | ↑ | 0.5247 | ± | 0.0146 |
none | 0 | acc_norm | ↑ | 0.5538 | ± | 0.0145 | ||
arc_easy | 1 | none | 0 | acc | ↑ | 0.8165 | ± | 0.0079 |
none | 0 | acc_norm | ↑ | 0.7934 | ± | 0.0083 | ||
boolq | 2 | none | 0 | acc | ↑ | 0.8722 | ± | 0.0058 |
hellaswag | 1 | none | 0 | acc | ↑ | 0.6052 | ± | 0.0049 |
none | 0 | acc_norm | ↑ | 0.7941 | ± | 0.0040 | ||
ifeval | 2 | none | 0 | inst_level_loose_acc | ↑ | 0.5132 | ± | N/A |
none | 0 | inst_level_strict_acc | ↑ | 0.4796 | ± | N/A | ||
none | 0 | prompt_level_loose_acc | ↑ | 0.4122 | ± | 0.0212 | ||
none | 0 | prompt_level_strict_acc | ↑ | 0.3734 | ± | 0.0208 | ||
mmlu | N/A | none | 0 | acc | ↑ | 0.5972 | ± | 0.0039 |
- abstract_algebra | 0 | none | 0 | acc | ↑ | 0.3100 | ± | 0.0465 |
- anatomy | 0 | none | 0 | acc | ↑ | 0.5852 | ± | 0.0426 |
- astronomy | 0 | none | 0 | acc | ↑ | 0.6447 | ± | 0.0389 |
- business_ethics | 0 | none | 0 | acc | ↑ | 0.5800 | ± | 0.0496 |
- clinical_knowledge | 0 | none | 0 | acc | ↑ | 0.6830 | ± | 0.0286 |
- college_biology | 0 | none | 0 | acc | ↑ | 0.7153 | ± | 0.0377 |
- college_chemistry | 0 | none | 0 | acc | ↑ | 0.4500 | ± | 0.0500 |
- college_computer_science | 0 | none | 0 | acc | ↑ | 0.4900 | ± | 0.0502 |
- college_mathematics | 0 | none | 0 | acc | ↑ | 0.3100 | ± | 0.0465 |
- college_medicine | 0 | none | 0 | acc | ↑ | 0.6069 | ± | 0.0372 |
- college_physics | 0 | none | 0 | acc | ↑ | 0.4020 | ± | 0.0488 |
- computer_security | 0 | none | 0 | acc | ↑ | 0.7200 | ± | 0.0451 |
- conceptual_physics | 0 | none | 0 | acc | ↑ | 0.5234 | ± | 0.0327 |
- econometrics | 0 | none | 0 | acc | ↑ | 0.4123 | ± | 0.0463 |
- electrical_engineering | 0 | none | 0 | acc | ↑ | 0.4759 | ± | 0.0416 |
- elementary_mathematics | 0 | none | 0 | acc | ↑ | 0.4180 | ± | 0.0254 |
- formal_logic | 0 | none | 0 | acc | ↑ | 0.4286 | ± | 0.0443 |
- global_facts | 0 | none | 0 | acc | ↑ | 0.3400 | ± | 0.0476 |
- high_school_biology | 0 | none | 0 | acc | ↑ | 0.7419 | ± | 0.0249 |
- high_school_chemistry | 0 | none | 0 | acc | ↑ | 0.4631 | ± | 0.0351 |
- high_school_computer_science | 0 | none | 0 | acc | ↑ | 0.6300 | ± | 0.0485 |
- high_school_european_history | 0 | none | 0 | acc | ↑ | 0.7394 | ± | 0.0343 |
- high_school_geography | 0 | none | 0 | acc | ↑ | 0.7323 | ± | 0.0315 |
- high_school_government_and_politics | 0 | none | 0 | acc | ↑ | 0.8238 | ± | 0.0275 |
- high_school_macroeconomics | 0 | none | 0 | acc | ↑ | 0.6308 | ± | 0.0245 |
- high_school_mathematics | 0 | none | 0 | acc | ↑ | 0.3333 | ± | 0.0287 |
- high_school_microeconomics | 0 | none | 0 | acc | ↑ | 0.6387 | ± | 0.0312 |
- high_school_physics | 0 | none | 0 | acc | ↑ | 0.2914 | ± | 0.0371 |
- high_school_psychology | 0 | none | 0 | acc | ↑ | 0.8128 | ± | 0.0167 |
- high_school_statistics | 0 | none | 0 | acc | ↑ | 0.4907 | ± | 0.0341 |
- high_school_us_history | 0 | none | 0 | acc | ↑ | 0.8186 | ± | 0.0270 |
- high_school_world_history | 0 | none | 0 | acc | ↑ | 0.8186 | ± | 0.0251 |
- human_aging | 0 | none | 0 | acc | ↑ | 0.6771 | ± | 0.0314 |
- human_sexuality | 0 | none | 0 | acc | ↑ | 0.7176 | ± | 0.0395 |
- humanities | N/A | none | 0 | acc | ↑ | 0.5411 | ± | 0.0066 |
- international_law | 0 | none | 0 | acc | ↑ | 0.7603 | ± | 0.0390 |
- jurisprudence | 0 | none | 0 | acc | ↑ | 0.7593 | ± | 0.0413 |
- logical_fallacies | 0 | none | 0 | acc | ↑ | 0.7239 | ± | 0.0351 |
- machine_learning | 0 | none | 0 | acc | ↑ | 0.5268 | ± | 0.0474 |
- management | 0 | none | 0 | acc | ↑ | 0.7864 | ± | 0.0406 |
- marketing | 0 | none | 0 | acc | ↑ | 0.8547 | ± | 0.0231 |
- medical_genetics | 0 | none | 0 | acc | ↑ | 0.6500 | ± | 0.0479 |
- miscellaneous | 0 | none | 0 | acc | ↑ | 0.7918 | ± | 0.0145 |
- moral_disputes | 0 | none | 0 | acc | ↑ | 0.6705 | ± | 0.0253 |
- moral_scenarios | 0 | none | 0 | acc | ↑ | 0.2268 | ± | 0.0140 |
- nutrition | 0 | none | 0 | acc | ↑ | 0.6961 | ± | 0.0263 |
- other | N/A | none | 0 | acc | ↑ | 0.6720 | ± | 0.0081 |
- philosophy | 0 | none | 0 | acc | ↑ | 0.6945 | ± | 0.0262 |
- prehistory | 0 | none | 0 | acc | ↑ | 0.6975 | ± | 0.0256 |
- professional_accounting | 0 | none | 0 | acc | ↑ | 0.4539 | ± | 0.0297 |
- professional_law | 0 | none | 0 | acc | ↑ | 0.4537 | ± | 0.0127 |
- professional_medicine | 0 | none | 0 | acc | ↑ | 0.6176 | ± | 0.0295 |
- professional_psychology | 0 | none | 0 | acc | ↑ | 0.6275 | ± | 0.0196 |
- public_relations | 0 | none | 0 | acc | ↑ | 0.6364 | ± | 0.0461 |
- security_studies | 0 | none | 0 | acc | ↑ | 0.7061 | ± | 0.0292 |
- social_sciences | N/A | none | 0 | acc | ↑ | 0.7043 | ± | 0.0080 |
- sociology | 0 | none | 0 | acc | ↑ | 0.8458 | ± | 0.0255 |
- stem | N/A | none | 0 | acc | ↑ | 0.5027 | ± | 0.0086 |
- us_foreign_policy | 0 | none | 0 | acc | ↑ | 0.8400 | ± | 0.0368 |
- virology | 0 | none | 0 | acc | ↑ | 0.5060 | ± | 0.0389 |
- world_religions | 0 | none | 0 | acc | ↑ | 0.8421 | ± | 0.0280 |
openbookqa | 1 | none | 0 | acc | ↑ | 0.3360 | ± | 0.0211 |
none | 0 | acc_norm | ↑ | 0.4380 | ± | 0.0222 | ||
piqa | 1 | none | 0 | acc | ↑ | 0.8112 | ± | 0.0091 |
none | 0 | acc_norm | ↑ | 0.8194 | ± | 0.0090 | ||
truthfulqa | N/A | none | 0 | acc | ↑ | 0.4422 | ± | 0.0113 |
none | 0 | bleu_acc | ↑ | 0.5398 | ± | 0.0174 | ||
none | 0 | bleu_diff | ↑ | 6.0075 | ± | 0.9539 | ||
none | 0 | bleu_max | ↑ | 30.9946 | ± | 0.8538 | ||
none | 0 | rouge1_acc | ↑ | 0.5545 | ± | 0.0174 | ||
none | 0 | rouge1_diff | ↑ | 8.7352 | ± | 1.2500 | ||
none | 0 | rouge1_max | ↑ | 57.5941 | ± | 0.8750 | ||
none | 0 | rouge2_acc | ↑ | 0.4810 | ± | 0.0175 | ||
none | 0 | rouge2_diff | ↑ | 7.9063 | ± | 1.3837 | ||
none | 0 | rouge2_max | ↑ | 43.4572 | ± | 1.0786 | ||
none | 0 | rougeL_acc | ↑ | 0.5239 | ± | 0.0175 | ||
none | 0 | rougeL_diff | ↑ | 8.3871 | ± | 1.2689 | ||
none | 0 | rougeL_max | ↑ | 54.6542 | ± | 0.9060 | ||
- truthfulqa_gen | 3 | none | 0 | bleu_acc | ↑ | 0.5398 | ± | 0.0174 |
none | 0 | bleu_diff | ↑ | 6.0075 | ± | 0.9539 | ||
none | 0 | bleu_max | ↑ | 30.9946 | ± | 0.8538 | ||
none | 0 | rouge1_acc | ↑ | 0.5545 | ± | 0.0174 | ||
none | 0 | rouge1_diff | ↑ | 8.7352 | ± | 1.2500 | ||
none | 0 | rouge1_max | ↑ | 57.5941 | ± | 0.8750 | ||
none | 0 | rouge2_acc | ↑ | 0.4810 | ± | 0.0175 | ||
none | 0 | rouge2_diff | ↑ | 7.9063 | ± | 1.3837 | ||
none | 0 | rouge2_max | ↑ | 43.4572 | ± | 1.0786 | ||
none | 0 | rougeL_acc | ↑ | 0.5239 | ± | 0.0175 | ||
none | 0 | rougeL_diff | ↑ | 8.3871 | ± | 1.2689 | ||
none | 0 | rougeL_max | ↑ | 54.6542 | ± | 0.9060 | ||
- truthfulqa_mc1 | 2 | none | 0 | acc | ↑ | 0.3574 | ± | 0.0168 |
- truthfulqa_mc2 | 2 | none | 0 | acc | ↑ | 0.5269 | ± | 0.0152 |
winogrande | 1 | none | 0 | acc | ↑ | 0.7222 | ± | 0.0126 |
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
agieval | N/A | none | 0 | acc | ↑ | 0.3846 | ± | 0.0051 |
none | 0 | acc_norm | ↑ | 0.4186 | ± | 0.0056 | ||
mmlu | N/A | none | 0 | acc | ↑ | 0.5972 | ± | 0.0039 |
- humanities | N/A | none | 0 | acc | ↑ | 0.5411 | ± | 0.0066 |
- other | N/A | none | 0 | acc | ↑ | 0.6720 | ± | 0.0081 |
- social_sciences | N/A | none | 0 | acc | ↑ | 0.7043 | ± | 0.0080 |
- stem | N/A | none | 0 | acc | ↑ | 0.5027 | ± | 0.0086 |
truthfulqa | N/A | none | 0 | acc | ↑ | 0.4422 | ± | 0.0113 |
none | 0 | bleu_acc | ↑ | 0.5398 | ± | 0.0174 | ||
none | 0 | bleu_diff | ↑ | 6.0075 | ± | 0.9539 | ||
none | 0 | bleu_max | ↑ | 30.9946 | ± | 0.8538 | ||
none | 0 | rouge1_acc | ↑ | 0.5545 | ± | 0.0174 | ||
none | 0 | rouge1_diff | ↑ | 8.7352 | ± | 1.2500 | ||
none | 0 | rouge1_max | ↑ | 57.5941 | ± | 0.8750 | ||
none | 0 | rouge2_acc | ↑ | 0.4810 | ± | 0.0175 | ||
none | 0 | rouge2_diff | ↑ | 7.9063 | ± | 1.3837 | ||
none | 0 | rouge2_max | ↑ | 43.4572 | ± | 1.0786 | ||
none | 0 | rougeL_acc | ↑ | 0.5239 | ± | 0.0175 | ||
none | 0 | rougeL_diff | ↑ | 8.3871 | ± | 1.2689 | ||
none | 0 | rougeL_max | ↑ | 54.6542 | ± | 0.9060 |