juvi21's picture
Update README.md
66b69dd verified
|
raw
history blame
20.9 kB
metadata
license: apache-2.0
datasets:
  - BAAI/Infinity-Instruct
tags:
  - axolotl
  - NousResearch/Hermes-2-Pro-Mistral-7B
  - finetune
  - gguf

Hermes 2 Pro Mistral-7B Infinity-Instruct GGUF

This model is a fine-tuned version of NousResearch/Hermes-2-Pro-Mistral-7B on the BAAI/Infinity-Instruct dataset. You can find the main model page here.

Model Details

  • Base Model: NousResearch/Hermes-2-Pro-Mistral-7B
  • Dataset: BAAI/Infinity-Instruct
  • Sequence Length: 8192 tokens
  • Training:
  • Epochs: 1
  • Hardware: 4 Nodes x 4 NVIDIA A100 40GB GPUs
  • Duration: 26:56:43
  • Cluster: KIT SCC Cluster

Benchmark n_shots=0

Benchmark Results

Benchmark Score
ARC (Challenge) 52.47%
ARC (Easy) 81.65%
BoolQ 87.22%
HellaSwag 60.52%
OpenBookQA 33.60%
PIQA 81.12%
Winogrande 72.22%
AGIEval 38.46%
TruthfulQA 44.22%
MMLU 59.72%
IFEval 47.96%

For detailed benchmark results, including sub-categories and various metrics, please refer to the full benchmark table at the end of this README.

License

This model is released under the Apache 2.0 license.

chatml

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
Knock Knock, who is there?<|im_end|>
<|im_start|>assistant
Hi there! <|im_end|>

Acknowledgements

Special thanks to:

  • NousResearch for their excellent base model
  • BAAI for providing the Infinity-Instruct dataset
  • KIT SCC for FLOPS

Citation

If you use this model in your research, consider citing. Although definetly cite NousResearch and BAAI:

@misc{hermes2pro-mistral-7b-infinity,
  author = {juvi21},
  title = {Hermes 2 Pro Mistral-7B Infinity-Instruct},
  year = {2024},
}

full-benchmark-results

Tasks Version Filter n-shot Metric Value Stderr
agieval N/A none 0 acc 0.3846 ± 0.0051
none 0 acc_norm 0.4186 ± 0.0056
- agieval_aqua_rat 1 none 0 acc 0.2520 ± 0.0273
none 0 acc_norm 0.2323 ± 0.0265
- agieval_gaokao_biology 1 none 0 acc 0.2952 ± 0.0316
none 0 acc_norm 0.3381 ± 0.0327
- agieval_gaokao_chemistry 1 none 0 acc 0.2560 ± 0.0304
none 0 acc_norm 0.2850 ± 0.0315
- agieval_gaokao_chinese 1 none 0 acc 0.2317 ± 0.0270
none 0 acc_norm 0.2236 ± 0.0266
- agieval_gaokao_english 1 none 0 acc 0.6667 ± 0.0270
none 0 acc_norm 0.6863 ± 0.0266
- agieval_gaokao_geography 1 none 0 acc 0.3869 ± 0.0346
none 0 acc_norm 0.4020 ± 0.0348
- agieval_gaokao_history 1 none 0 acc 0.4468 ± 0.0325
none 0 acc_norm 0.3957 ± 0.0320
- agieval_gaokao_mathcloze 1 none 0 acc 0.0254 ± 0.0146
- agieval_gaokao_mathqa 1 none 0 acc 0.2507 ± 0.0232
none 0 acc_norm 0.2621 ± 0.0235
- agieval_gaokao_physics 1 none 0 acc 0.2900 ± 0.0322
none 0 acc_norm 0.3100 ± 0.0328
- agieval_jec_qa_ca 1 none 0 acc 0.4735 ± 0.0158
none 0 acc_norm 0.4695 ± 0.0158
- agieval_jec_qa_kd 1 none 0 acc 0.5290 ± 0.0158
none 0 acc_norm 0.5140 ± 0.0158
- agieval_logiqa_en 1 none 0 acc 0.3579 ± 0.0188
none 0 acc_norm 0.3779 ± 0.0190
- agieval_logiqa_zh 1 none 0 acc 0.3103 ± 0.0181
none 0 acc_norm 0.3318 ± 0.0185
- agieval_lsat_ar 1 none 0 acc 0.2217 ± 0.0275
none 0 acc_norm 0.2217 ± 0.0275
- agieval_lsat_lr 1 none 0 acc 0.5333 ± 0.0221
none 0 acc_norm 0.5098 ± 0.0222
- agieval_lsat_rc 1 none 0 acc 0.5948 ± 0.0300
none 0 acc_norm 0.5353 ± 0.0305
- agieval_math 1 none 0 acc 0.1520 ± 0.0114
- agieval_sat_en 1 none 0 acc 0.7864 ± 0.0286
none 0 acc_norm 0.7621 ± 0.0297
- agieval_sat_en_without_passage 1 none 0 acc 0.4660 ± 0.0348
none 0 acc_norm 0.4272 ± 0.0345
- agieval_sat_math 1 none 0 acc 0.3591 ± 0.0324
none 0 acc_norm 0.3045 ± 0.0311
arc_challenge 1 none 0 acc 0.5247 ± 0.0146
none 0 acc_norm 0.5538 ± 0.0145
arc_easy 1 none 0 acc 0.8165 ± 0.0079
none 0 acc_norm 0.7934 ± 0.0083
boolq 2 none 0 acc 0.8722 ± 0.0058
hellaswag 1 none 0 acc 0.6052 ± 0.0049
none 0 acc_norm 0.7941 ± 0.0040
ifeval 2 none 0 inst_level_loose_acc 0.5132 ± N/A
none 0 inst_level_strict_acc 0.4796 ± N/A
none 0 prompt_level_loose_acc 0.4122 ± 0.0212
none 0 prompt_level_strict_acc 0.3734 ± 0.0208
mmlu N/A none 0 acc 0.5972 ± 0.0039
- abstract_algebra 0 none 0 acc 0.3100 ± 0.0465
- anatomy 0 none 0 acc 0.5852 ± 0.0426
- astronomy 0 none 0 acc 0.6447 ± 0.0389
- business_ethics 0 none 0 acc 0.5800 ± 0.0496
- clinical_knowledge 0 none 0 acc 0.6830 ± 0.0286
- college_biology 0 none 0 acc 0.7153 ± 0.0377
- college_chemistry 0 none 0 acc 0.4500 ± 0.0500
- college_computer_science 0 none 0 acc 0.4900 ± 0.0502
- college_mathematics 0 none 0 acc 0.3100 ± 0.0465
- college_medicine 0 none 0 acc 0.6069 ± 0.0372
- college_physics 0 none 0 acc 0.4020 ± 0.0488
- computer_security 0 none 0 acc 0.7200 ± 0.0451
- conceptual_physics 0 none 0 acc 0.5234 ± 0.0327
- econometrics 0 none 0 acc 0.4123 ± 0.0463
- electrical_engineering 0 none 0 acc 0.4759 ± 0.0416
- elementary_mathematics 0 none 0 acc 0.4180 ± 0.0254
- formal_logic 0 none 0 acc 0.4286 ± 0.0443
- global_facts 0 none 0 acc 0.3400 ± 0.0476
- high_school_biology 0 none 0 acc 0.7419 ± 0.0249
- high_school_chemistry 0 none 0 acc 0.4631 ± 0.0351
- high_school_computer_science 0 none 0 acc 0.6300 ± 0.0485
- high_school_european_history 0 none 0 acc 0.7394 ± 0.0343
- high_school_geography 0 none 0 acc 0.7323 ± 0.0315
- high_school_government_and_politics 0 none 0 acc 0.8238 ± 0.0275
- high_school_macroeconomics 0 none 0 acc 0.6308 ± 0.0245
- high_school_mathematics 0 none 0 acc 0.3333 ± 0.0287
- high_school_microeconomics 0 none 0 acc 0.6387 ± 0.0312
- high_school_physics 0 none 0 acc 0.2914 ± 0.0371
- high_school_psychology 0 none 0 acc 0.8128 ± 0.0167
- high_school_statistics 0 none 0 acc 0.4907 ± 0.0341
- high_school_us_history 0 none 0 acc 0.8186 ± 0.0270
- high_school_world_history 0 none 0 acc 0.8186 ± 0.0251
- human_aging 0 none 0 acc 0.6771 ± 0.0314
- human_sexuality 0 none 0 acc 0.7176 ± 0.0395
- humanities N/A none 0 acc 0.5411 ± 0.0066
- international_law 0 none 0 acc 0.7603 ± 0.0390
- jurisprudence 0 none 0 acc 0.7593 ± 0.0413
- logical_fallacies 0 none 0 acc 0.7239 ± 0.0351
- machine_learning 0 none 0 acc 0.5268 ± 0.0474
- management 0 none 0 acc 0.7864 ± 0.0406
- marketing 0 none 0 acc 0.8547 ± 0.0231
- medical_genetics 0 none 0 acc 0.6500 ± 0.0479
- miscellaneous 0 none 0 acc 0.7918 ± 0.0145
- moral_disputes 0 none 0 acc 0.6705 ± 0.0253
- moral_scenarios 0 none 0 acc 0.2268 ± 0.0140
- nutrition 0 none 0 acc 0.6961 ± 0.0263
- other N/A none 0 acc 0.6720 ± 0.0081
- philosophy 0 none 0 acc 0.6945 ± 0.0262
- prehistory 0 none 0 acc 0.6975 ± 0.0256
- professional_accounting 0 none 0 acc 0.4539 ± 0.0297
- professional_law 0 none 0 acc 0.4537 ± 0.0127
- professional_medicine 0 none 0 acc 0.6176 ± 0.0295
- professional_psychology 0 none 0 acc 0.6275 ± 0.0196
- public_relations 0 none 0 acc 0.6364 ± 0.0461
- security_studies 0 none 0 acc 0.7061 ± 0.0292
- social_sciences N/A none 0 acc 0.7043 ± 0.0080
- sociology 0 none 0 acc 0.8458 ± 0.0255
- stem N/A none 0 acc 0.5027 ± 0.0086
- us_foreign_policy 0 none 0 acc 0.8400 ± 0.0368
- virology 0 none 0 acc 0.5060 ± 0.0389
- world_religions 0 none 0 acc 0.8421 ± 0.0280
openbookqa 1 none 0 acc 0.3360 ± 0.0211
none 0 acc_norm 0.4380 ± 0.0222
piqa 1 none 0 acc 0.8112 ± 0.0091
none 0 acc_norm 0.8194 ± 0.0090
truthfulqa N/A none 0 acc 0.4422 ± 0.0113
none 0 bleu_acc 0.5398 ± 0.0174
none 0 bleu_diff 6.0075 ± 0.9539
none 0 bleu_max 30.9946 ± 0.8538
none 0 rouge1_acc 0.5545 ± 0.0174
none 0 rouge1_diff 8.7352 ± 1.2500
none 0 rouge1_max 57.5941 ± 0.8750
none 0 rouge2_acc 0.4810 ± 0.0175
none 0 rouge2_diff 7.9063 ± 1.3837
none 0 rouge2_max 43.4572 ± 1.0786
none 0 rougeL_acc 0.5239 ± 0.0175
none 0 rougeL_diff 8.3871 ± 1.2689
none 0 rougeL_max 54.6542 ± 0.9060
- truthfulqa_gen 3 none 0 bleu_acc 0.5398 ± 0.0174
none 0 bleu_diff 6.0075 ± 0.9539
none 0 bleu_max 30.9946 ± 0.8538
none 0 rouge1_acc 0.5545 ± 0.0174
none 0 rouge1_diff 8.7352 ± 1.2500
none 0 rouge1_max 57.5941 ± 0.8750
none 0 rouge2_acc 0.4810 ± 0.0175
none 0 rouge2_diff 7.9063 ± 1.3837
none 0 rouge2_max 43.4572 ± 1.0786
none 0 rougeL_acc 0.5239 ± 0.0175
none 0 rougeL_diff 8.3871 ± 1.2689
none 0 rougeL_max 54.6542 ± 0.9060
- truthfulqa_mc1 2 none 0 acc 0.3574 ± 0.0168
- truthfulqa_mc2 2 none 0 acc 0.5269 ± 0.0152
winogrande 1 none 0 acc 0.7222 ± 0.0126
Groups Version Filter n-shot Metric Value Stderr
agieval N/A none 0 acc 0.3846 ± 0.0051
none 0 acc_norm 0.4186 ± 0.0056
mmlu N/A none 0 acc 0.5972 ± 0.0039
- humanities N/A none 0 acc 0.5411 ± 0.0066
- other N/A none 0 acc 0.6720 ± 0.0081
- social_sciences N/A none 0 acc 0.7043 ± 0.0080
- stem N/A none 0 acc 0.5027 ± 0.0086
truthfulqa N/A none 0 acc 0.4422 ± 0.0113
none 0 bleu_acc 0.5398 ± 0.0174
none 0 bleu_diff 6.0075 ± 0.9539
none 0 bleu_max 30.9946 ± 0.8538
none 0 rouge1_acc 0.5545 ± 0.0174
none 0 rouge1_diff 8.7352 ± 1.2500
none 0 rouge1_max 57.5941 ± 0.8750
none 0 rouge2_acc 0.4810 ± 0.0175
none 0 rouge2_diff 7.9063 ± 1.3837
none 0 rouge2_max 43.4572 ± 1.0786
none 0 rougeL_acc 0.5239 ± 0.0175
none 0 rougeL_diff 8.3871 ± 1.2689
none 0 rougeL_max 54.6542 ± 0.9060