metadata

license: apache-2.0
datasets:
  - BAAI/Infinity-Instruct
tags:
  - axolotl
  - NousResearch/Hermes-2-Pro-Mistral-7B
  - finetune
  - gguf

Hermes 2 Pro Mistral-7B Infinity-Instruct GGUF

This model is a fine-tuned version of NousResearch/Hermes-2-Pro-Mistral-7B on the BAAI/Infinity-Instruct dataset. You can find the main model page here.

Model Details

Base Model: NousResearch/Hermes-2-Pro-Mistral-7B
Dataset: BAAI/Infinity-Instruct
Sequence Length: 8192 tokens
Training:
Epochs: 1
Hardware: 4 Nodes x 4 NVIDIA A100 40GB GPUs
Duration: 26:56:43
Cluster: KIT SCC Cluster

Benchmark n_shots=0

Benchmark	Score
ARC (Challenge)	52.47%
ARC (Easy)	81.65%
BoolQ	87.22%
HellaSwag	60.52%
OpenBookQA	33.60%
PIQA	81.12%
Winogrande	72.22%
AGIEval	38.46%
TruthfulQA	44.22%
MMLU	59.72%
IFEval	47.96%

For detailed benchmark results, including sub-categories and various metrics, please refer to the full benchmark table at the end of this README.

License

This model is released under the Apache 2.0 license.

chatml

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
Knock Knock, who is there?<|im_end|>
<|im_start|>assistant
Hi there! <|im_end|>

Acknowledgements

Special thanks to:

NousResearch for their excellent base model
BAAI for providing the Infinity-Instruct dataset
KIT SCC for FLOPS

Citation

If you use this model in your research, consider citing. Although definetly cite NousResearch and BAAI:

@misc{hermes2pro-mistral-7b-infinity,
  author = {juvi21},
  title = {Hermes 2 Pro Mistral-7B Infinity-Instruct},
  year = {2024},
}

full-benchmark-results

Tasks	Version	Filter	Metric		Value		Stderr
agieval	N/A	none	acc	↑	0.3846	±	0.0051
		none	acc_norm	↑	0.4186	±	0.0056
- agieval_aqua_rat	1	none	acc	↑	0.2520	±	0.0273
		none	acc_norm	↑	0.2323	±	0.0265
- agieval_gaokao_biology	1	none	acc	↑	0.2952	±	0.0316
		none	acc_norm	↑	0.3381	±	0.0327
- agieval_gaokao_chemistry	1	none	acc	↑	0.2560	±	0.0304
		none	acc_norm	↑	0.2850	±	0.0315
- agieval_gaokao_chinese	1	none	acc	↑	0.2317	±	0.0270
		none	acc_norm	↑	0.2236	±	0.0266
- agieval_gaokao_english	1	none	acc	↑	0.6667	±	0.0270
		none	acc_norm	↑	0.6863	±	0.0266
- agieval_gaokao_geography	1	none	acc	↑	0.3869	±	0.0346
		none	acc_norm	↑	0.4020	±	0.0348
- agieval_gaokao_history	1	none	acc	↑	0.4468	±	0.0325
		none	acc_norm	↑	0.3957	±	0.0320
- agieval_gaokao_mathcloze	1	none	acc	↑	0.0254	±	0.0146
- agieval_gaokao_mathqa	1	none	acc	↑	0.2507	±	0.0232
		none	acc_norm	↑	0.2621	±	0.0235
- agieval_gaokao_physics	1	none	acc	↑	0.2900	±	0.0322
		none	acc_norm	↑	0.3100	±	0.0328
- agieval_jec_qa_ca	1	none	acc	↑	0.4735	±	0.0158
		none	acc_norm	↑	0.4695	±	0.0158
- agieval_jec_qa_kd	1	none	acc	↑	0.5290	±	0.0158
		none	acc_norm	↑	0.5140	±	0.0158
- agieval_logiqa_en	1	none	acc	↑	0.3579	±	0.0188
		none	acc_norm	↑	0.3779	±	0.0190
- agieval_logiqa_zh	1	none	acc	↑	0.3103	±	0.0181
		none	acc_norm	↑	0.3318	±	0.0185
- agieval_lsat_ar	1	none	acc	↑	0.2217	±	0.0275
		none	acc_norm	↑	0.2217	±	0.0275
- agieval_lsat_lr	1	none	acc	↑	0.5333	±	0.0221
		none	acc_norm	↑	0.5098	±	0.0222
- agieval_lsat_rc	1	none	acc	↑	0.5948	±	0.0300
		none	acc_norm	↑	0.5353	±	0.0305
- agieval_math	1	none	acc	↑	0.1520	±	0.0114
- agieval_sat_en	1	none	acc	↑	0.7864	±	0.0286
		none	acc_norm	↑	0.7621	±	0.0297
- agieval_sat_en_without_passage	1	none	acc	↑	0.4660	±	0.0348
		none	acc_norm	↑	0.4272	±	0.0345
- agieval_sat_math	1	none	acc	↑	0.3591	±	0.0324
		none	acc_norm	↑	0.3045	±	0.0311
arc_challenge	1	none	acc	↑	0.5247	±	0.0146
		none	acc_norm	↑	0.5538	±	0.0145
arc_easy	1	none	acc	↑	0.8165	±	0.0079
		none	acc_norm	↑	0.7934	±	0.0083
boolq	2	none	acc	↑	0.8722	±	0.0058
hellaswag	1	none	acc	↑	0.6052	±	0.0049
		none	acc_norm	↑	0.7941	±	0.0040
ifeval	2	none	inst_level_loose_acc	↑	0.5132	±	N/A
		none	inst_level_strict_acc	↑	0.4796	±	N/A
		none	prompt_level_loose_acc	↑	0.4122	±	0.0212
		none	prompt_level_strict_acc	↑	0.3734	±	0.0208
mmlu	N/A	none	acc	↑	0.5972	±	0.0039
- abstract_algebra	0	none	acc	↑	0.3100	±	0.0465
- anatomy	0	none	acc	↑	0.5852	±	0.0426
- astronomy	0	none	acc	↑	0.6447	±	0.0389
- business_ethics	0	none	acc	↑	0.5800	±	0.0496
- clinical_knowledge	0	none	acc	↑	0.6830	±	0.0286
- college_biology	0	none	acc	↑	0.7153	±	0.0377
- college_chemistry	0	none	acc	↑	0.4500	±	0.0500
- college_computer_science	0	none	acc	↑	0.4900	±	0.0502
- college_mathematics	0	none	acc	↑	0.3100	±	0.0465
- college_medicine	0	none	acc	↑	0.6069	±	0.0372
- college_physics	0	none	acc	↑	0.4020	±	0.0488
- computer_security	0	none	acc	↑	0.7200	±	0.0451
- conceptual_physics	0	none	acc	↑	0.5234	±	0.0327
- econometrics	0	none	acc	↑	0.4123	±	0.0463
- electrical_engineering	0	none	acc	↑	0.4759	±	0.0416
- elementary_mathematics	0	none	acc	↑	0.4180	±	0.0254
- formal_logic	0	none	acc	↑	0.4286	±	0.0443
- global_facts	0	none	acc	↑	0.3400	±	0.0476
- high_school_biology	0	none	acc	↑	0.7419	±	0.0249
- high_school_chemistry	0	none	acc	↑	0.4631	±	0.0351
- high_school_computer_science	0	none	acc	↑	0.6300	±	0.0485
- high_school_european_history	0	none	acc	↑	0.7394	±	0.0343
- high_school_geography	0	none	acc	↑	0.7323	±	0.0315
- high_school_government_and_politics	0	none	acc	↑	0.8238	±	0.0275
- high_school_macroeconomics	0	none	acc	↑	0.6308	±	0.0245
- high_school_mathematics	0	none	acc	↑	0.3333	±	0.0287
- high_school_microeconomics	0	none	acc	↑	0.6387	±	0.0312
- high_school_physics	0	none	acc	↑	0.2914	±	0.0371
- high_school_psychology	0	none	acc	↑	0.8128	±	0.0167
- high_school_statistics	0	none	acc	↑	0.4907	±	0.0341
- high_school_us_history	0	none	acc	↑	0.8186	±	0.0270
- high_school_world_history	0	none	acc	↑	0.8186	±	0.0251
- human_aging	0	none	acc	↑	0.6771	±	0.0314
- human_sexuality	0	none	acc	↑	0.7176	±	0.0395
- humanities	N/A	none	acc	↑	0.5411	±	0.0066
- international_law	0	none	acc	↑	0.7603	±	0.0390
- jurisprudence	0	none	acc	↑	0.7593	±	0.0413
- logical_fallacies	0	none	acc	↑	0.7239	±	0.0351
- machine_learning	0	none	acc	↑	0.5268	±	0.0474
- management	0	none	acc	↑	0.7864	±	0.0406
- marketing	0	none	acc	↑	0.8547	±	0.0231
- medical_genetics	0	none	acc	↑	0.6500	±	0.0479
- miscellaneous	0	none	acc	↑	0.7918	±	0.0145
- moral_disputes	0	none	acc	↑	0.6705	±	0.0253
- moral_scenarios	0	none	acc	↑	0.2268	±	0.0140
- nutrition	0	none	acc	↑	0.6961	±	0.0263
- other	N/A	none	acc	↑	0.6720	±	0.0081
- philosophy	0	none	acc	↑	0.6945	±	0.0262
- prehistory	0	none	acc	↑	0.6975	±	0.0256
- professional_accounting	0	none	acc	↑	0.4539	±	0.0297
- professional_law	0	none	acc	↑	0.4537	±	0.0127
- professional_medicine	0	none	acc	↑	0.6176	±	0.0295
- professional_psychology	0	none	acc	↑	0.6275	±	0.0196
- public_relations	0	none	acc	↑	0.6364	±	0.0461
- security_studies	0	none	acc	↑	0.7061	±	0.0292
- social_sciences	N/A	none	acc	↑	0.7043	±	0.0080
- sociology	0	none	acc	↑	0.8458	±	0.0255
- stem	N/A	none	acc	↑	0.5027	±	0.0086
- us_foreign_policy	0	none	acc	↑	0.8400	±	0.0368
- virology	0	none	acc	↑	0.5060	±	0.0389
- world_religions	0	none	acc	↑	0.8421	±	0.0280
openbookqa	1	none	acc	↑	0.3360	±	0.0211
		none	acc_norm	↑	0.4380	±	0.0222
piqa	1	none	acc	↑	0.8112	±	0.0091
		none	acc_norm	↑	0.8194	±	0.0090
truthfulqa	N/A	none	acc	↑	0.4422	±	0.0113
		none	bleu_acc	↑	0.5398	±	0.0174
		none	bleu_diff	↑	6.0075	±	0.9539
		none	bleu_max	↑	30.9946	±	0.8538
		none	rouge1_acc	↑	0.5545	±	0.0174
		none	rouge1_diff	↑	8.7352	±	1.2500
		none	rouge1_max	↑	57.5941	±	0.8750
		none	rouge2_acc	↑	0.4810	±	0.0175
		none	rouge2_diff	↑	7.9063	±	1.3837
		none	rouge2_max	↑	43.4572	±	1.0786
		none	rougeL_acc	↑	0.5239	±	0.0175
		none	rougeL_diff	↑	8.3871	±	1.2689
		none	rougeL_max	↑	54.6542	±	0.9060
- truthfulqa_gen	3	none	bleu_acc	↑	0.5398	±	0.0174
		none	bleu_diff	↑	6.0075	±	0.9539
		none	bleu_max	↑	30.9946	±	0.8538
		none	rouge1_acc	↑	0.5545	±	0.0174
		none	rouge1_diff	↑	8.7352	±	1.2500
		none	rouge1_max	↑	57.5941	±	0.8750
		none	rouge2_acc	↑	0.4810	±	0.0175
		none	rouge2_diff	↑	7.9063	±	1.3837
		none	rouge2_max	↑	43.4572	±	1.0786
		none	rougeL_acc	↑	0.5239	±	0.0175
		none	rougeL_diff	↑	8.3871	±	1.2689
		none	rougeL_max	↑	54.6542	±	0.9060
- truthfulqa_mc1	2	none	acc	↑	0.3574	±	0.0168
- truthfulqa_mc2	2	none	acc	↑	0.5269	±	0.0152
winogrande	1	none	acc	↑	0.7222	±	0.0126

Groups	Version	Filter	Metric		Value		Stderr
agieval	N/A	none	acc	↑	0.3846	±	0.0051
		none	acc_norm	↑	0.4186	±	0.0056
mmlu	N/A	none	acc	↑	0.5972	±	0.0039
- humanities	N/A	none	acc	↑	0.5411	±	0.0066
- other	N/A	none	acc	↑	0.6720	±	0.0081
- social_sciences	N/A	none	acc	↑	0.7043	±	0.0080
- stem	N/A	none	acc	↑	0.5027	±	0.0086
truthfulqa	N/A	none	acc	↑	0.4422	±	0.0113
		none	bleu_acc	↑	0.5398	±	0.0174
		none	bleu_diff	↑	6.0075	±	0.9539
		none	bleu_max	↑	30.9946	±	0.8538
		none	rouge1_acc	↑	0.5545	±	0.0174
		none	rouge1_diff	↑	8.7352	±	1.2500
		none	rouge1_max	↑	57.5941	±	0.8750
		none	rouge2_acc	↑	0.4810	±	0.0175
		none	rouge2_diff	↑	7.9063	±	1.3837
		none	rouge2_max	↑	43.4572	±	1.0786
		none	rougeL_acc	↑	0.5239	±	0.0175
		none	rougeL_diff	↑	8.3871	±	1.2689
		none	rougeL_max	↑	54.6542	±	0.9060