|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- BAAI/Infinity-Instruct |
|
tags: |
|
- axolotl |
|
- NousResearch/Hermes-2-Pro-Mistral-7B |
|
- finetune |
|
- gguf |
|
--- |
|
|
|
|
|
# Hermes 2 Pro Mistral-7B Infinity-Instruct GGUF |
|
|
|
This model is a fine-tuned version of [NousResearch/Hermes-2-Pro-Mistral-7B](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B) on the [BAAI/Infinity-Instruct](https://huggingface.co/datasets/BAAI/Infinity-Instruct) dataset. |
|
You can find the main model page [here](https://huggingface.co/juvi21/Hermes-2-Pro-Mistral-7B-infinity). |
|
|
|
## Model Details |
|
|
|
- **Base Model:** NousResearch/Hermes-2-Pro-Mistral-7B |
|
- **Dataset:** BAAI/Infinity-Instruct |
|
- **Sequence Length:** 8192 tokens |
|
- **Training:** |
|
- **Epochs:** 1 |
|
- **Hardware:** 4 Nodes x 4 NVIDIA A100 40GB GPUs |
|
- **Duration:** 26:56:43 |
|
- **Cluster:** KIT SCC Cluster |
|
|
|
## Benchmark n_shots=0 |
|
|
|
![Benchmark Results](https://cdn-uploads.huggingface.co/production/uploads/659c4ecb413a1376bee2f661/gzwCfT8HTBRpRAzj2mN67.png) |
|
|
|
|
|
| Benchmark | Score | |
|
|-----------|-------| |
|
| ARC (Challenge) | 52.47% | |
|
| ARC (Easy) | 81.65% | |
|
| BoolQ | 87.22% | |
|
| HellaSwag | 60.52% | |
|
| OpenBookQA | 33.60% | |
|
| PIQA | 81.12% | |
|
| Winogrande | 72.22% | |
|
| AGIEval | 38.46% | |
|
| TruthfulQA | 44.22% | |
|
| MMLU | 59.72% | |
|
| IFEval | 47.96% | |
|
|
|
For detailed benchmark results, including sub-categories and various metrics, please refer to the [full benchmark table](#full-benchmark-results) at the end of this README. |
|
|
|
## License |
|
|
|
This model is released under the Apache 2.0 license. |
|
|
|
## chatml |
|
``` |
|
<|im_start|>system |
|
{system_prompt}<|im_end|> |
|
<|im_start|>user |
|
Knock Knock, who is there?<|im_end|> |
|
<|im_start|>assistant |
|
Hi there! <|im_end|> |
|
``` |
|
|
|
## Acknowledgements |
|
|
|
Special thanks to: |
|
- NousResearch for their excellent base model |
|
- BAAI for providing the Infinity-Instruct dataset |
|
- KIT SCC for FLOPS |
|
|
|
## Citation |
|
|
|
If you use this model in your research, consider citing. Although definetly cite NousResearch and BAAI: |
|
|
|
```bibtex |
|
@misc{hermes2pro-mistral-7b-infinity, |
|
author = {juvi21}, |
|
title = {Hermes 2 Pro Mistral-7B Infinity-Instruct}, |
|
year = {2024}, |
|
} |
|
``` |
|
## full-benchmark-results |
|
| Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr| |
|
|---------------------------------------|-------|------|-----:|-----------------------|---|------:|---|------| |
|
|agieval |N/A |none | 0|acc |↑ | 0.3846|± |0.0051| |
|
| | |none | 0|acc_norm |↑ | 0.4186|± |0.0056| |
|
| - agieval_aqua_rat | 1|none | 0|acc |↑ | 0.2520|± |0.0273| |
|
| | |none | 0|acc_norm |↑ | 0.2323|± |0.0265| |
|
| - agieval_gaokao_biology | 1|none | 0|acc |↑ | 0.2952|± |0.0316| |
|
| | |none | 0|acc_norm |↑ | 0.3381|± |0.0327| |
|
| - agieval_gaokao_chemistry | 1|none | 0|acc |↑ | 0.2560|± |0.0304| |
|
| | |none | 0|acc_norm |↑ | 0.2850|± |0.0315| |
|
| - agieval_gaokao_chinese | 1|none | 0|acc |↑ | 0.2317|± |0.0270| |
|
| | |none | 0|acc_norm |↑ | 0.2236|± |0.0266| |
|
| - agieval_gaokao_english | 1|none | 0|acc |↑ | 0.6667|± |0.0270| |
|
| | |none | 0|acc_norm |↑ | 0.6863|± |0.0266| |
|
| - agieval_gaokao_geography | 1|none | 0|acc |↑ | 0.3869|± |0.0346| |
|
| | |none | 0|acc_norm |↑ | 0.4020|± |0.0348| |
|
| - agieval_gaokao_history | 1|none | 0|acc |↑ | 0.4468|± |0.0325| |
|
| | |none | 0|acc_norm |↑ | 0.3957|± |0.0320| |
|
| - agieval_gaokao_mathcloze | 1|none | 0|acc |↑ | 0.0254|± |0.0146| |
|
| - agieval_gaokao_mathqa | 1|none | 0|acc |↑ | 0.2507|± |0.0232| |
|
| | |none | 0|acc_norm |↑ | 0.2621|± |0.0235| |
|
| - agieval_gaokao_physics | 1|none | 0|acc |↑ | 0.2900|± |0.0322| |
|
| | |none | 0|acc_norm |↑ | 0.3100|± |0.0328| |
|
| - agieval_jec_qa_ca | 1|none | 0|acc |↑ | 0.4735|± |0.0158| |
|
| | |none | 0|acc_norm |↑ | 0.4695|± |0.0158| |
|
| - agieval_jec_qa_kd | 1|none | 0|acc |↑ | 0.5290|± |0.0158| |
|
| | |none | 0|acc_norm |↑ | 0.5140|± |0.0158| |
|
| - agieval_logiqa_en | 1|none | 0|acc |↑ | 0.3579|± |0.0188| |
|
| | |none | 0|acc_norm |↑ | 0.3779|± |0.0190| |
|
| - agieval_logiqa_zh | 1|none | 0|acc |↑ | 0.3103|± |0.0181| |
|
| | |none | 0|acc_norm |↑ | 0.3318|± |0.0185| |
|
| - agieval_lsat_ar | 1|none | 0|acc |↑ | 0.2217|± |0.0275| |
|
| | |none | 0|acc_norm |↑ | 0.2217|± |0.0275| |
|
| - agieval_lsat_lr | 1|none | 0|acc |↑ | 0.5333|± |0.0221| |
|
| | |none | 0|acc_norm |↑ | 0.5098|± |0.0222| |
|
| - agieval_lsat_rc | 1|none | 0|acc |↑ | 0.5948|± |0.0300| |
|
| | |none | 0|acc_norm |↑ | 0.5353|± |0.0305| |
|
| - agieval_math | 1|none | 0|acc |↑ | 0.1520|± |0.0114| |
|
| - agieval_sat_en | 1|none | 0|acc |↑ | 0.7864|± |0.0286| |
|
| | |none | 0|acc_norm |↑ | 0.7621|± |0.0297| |
|
| - agieval_sat_en_without_passage | 1|none | 0|acc |↑ | 0.4660|± |0.0348| |
|
| | |none | 0|acc_norm |↑ | 0.4272|± |0.0345| |
|
| - agieval_sat_math | 1|none | 0|acc |↑ | 0.3591|± |0.0324| |
|
| | |none | 0|acc_norm |↑ | 0.3045|± |0.0311| |
|
|arc_challenge | 1|none | 0|acc |↑ | 0.5247|± |0.0146| |
|
| | |none | 0|acc_norm |↑ | 0.5538|± |0.0145| |
|
|arc_easy | 1|none | 0|acc |↑ | 0.8165|± |0.0079| |
|
| | |none | 0|acc_norm |↑ | 0.7934|± |0.0083| |
|
|boolq | 2|none | 0|acc |↑ | 0.8722|± |0.0058| |
|
|hellaswag | 1|none | 0|acc |↑ | 0.6052|± |0.0049| |
|
| | |none | 0|acc_norm |↑ | 0.7941|± |0.0040| |
|
|ifeval | 2|none | 0|inst_level_loose_acc |↑ | 0.5132|± |N/A | |
|
| | |none | 0|inst_level_strict_acc |↑ | 0.4796|± |N/A | |
|
| | |none | 0|prompt_level_loose_acc |↑ | 0.4122|± |0.0212| |
|
| | |none | 0|prompt_level_strict_acc|↑ | 0.3734|± |0.0208| |
|
|mmlu |N/A |none | 0|acc |↑ | 0.5972|± |0.0039| |
|
| - abstract_algebra | 0|none | 0|acc |↑ | 0.3100|± |0.0465| |
|
| - anatomy | 0|none | 0|acc |↑ | 0.5852|± |0.0426| |
|
| - astronomy | 0|none | 0|acc |↑ | 0.6447|± |0.0389| |
|
| - business_ethics | 0|none | 0|acc |↑ | 0.5800|± |0.0496| |
|
| - clinical_knowledge | 0|none | 0|acc |↑ | 0.6830|± |0.0286| |
|
| - college_biology | 0|none | 0|acc |↑ | 0.7153|± |0.0377| |
|
| - college_chemistry | 0|none | 0|acc |↑ | 0.4500|± |0.0500| |
|
| - college_computer_science | 0|none | 0|acc |↑ | 0.4900|± |0.0502| |
|
| - college_mathematics | 0|none | 0|acc |↑ | 0.3100|± |0.0465| |
|
| - college_medicine | 0|none | 0|acc |↑ | 0.6069|± |0.0372| |
|
| - college_physics | 0|none | 0|acc |↑ | 0.4020|± |0.0488| |
|
| - computer_security | 0|none | 0|acc |↑ | 0.7200|± |0.0451| |
|
| - conceptual_physics | 0|none | 0|acc |↑ | 0.5234|± |0.0327| |
|
| - econometrics | 0|none | 0|acc |↑ | 0.4123|± |0.0463| |
|
| - electrical_engineering | 0|none | 0|acc |↑ | 0.4759|± |0.0416| |
|
| - elementary_mathematics | 0|none | 0|acc |↑ | 0.4180|± |0.0254| |
|
| - formal_logic | 0|none | 0|acc |↑ | 0.4286|± |0.0443| |
|
| - global_facts | 0|none | 0|acc |↑ | 0.3400|± |0.0476| |
|
| - high_school_biology | 0|none | 0|acc |↑ | 0.7419|± |0.0249| |
|
| - high_school_chemistry | 0|none | 0|acc |↑ | 0.4631|± |0.0351| |
|
| - high_school_computer_science | 0|none | 0|acc |↑ | 0.6300|± |0.0485| |
|
| - high_school_european_history | 0|none | 0|acc |↑ | 0.7394|± |0.0343| |
|
| - high_school_geography | 0|none | 0|acc |↑ | 0.7323|± |0.0315| |
|
| - high_school_government_and_politics| 0|none | 0|acc |↑ | 0.8238|± |0.0275| |
|
| - high_school_macroeconomics | 0|none | 0|acc |↑ | 0.6308|± |0.0245| |
|
| - high_school_mathematics | 0|none | 0|acc |↑ | 0.3333|± |0.0287| |
|
| - high_school_microeconomics | 0|none | 0|acc |↑ | 0.6387|± |0.0312| |
|
| - high_school_physics | 0|none | 0|acc |↑ | 0.2914|± |0.0371| |
|
| - high_school_psychology | 0|none | 0|acc |↑ | 0.8128|± |0.0167| |
|
| - high_school_statistics | 0|none | 0|acc |↑ | 0.4907|± |0.0341| |
|
| - high_school_us_history | 0|none | 0|acc |↑ | 0.8186|± |0.0270| |
|
| - high_school_world_history | 0|none | 0|acc |↑ | 0.8186|± |0.0251| |
|
| - human_aging | 0|none | 0|acc |↑ | 0.6771|± |0.0314| |
|
| - human_sexuality | 0|none | 0|acc |↑ | 0.7176|± |0.0395| |
|
| - humanities |N/A |none | 0|acc |↑ | 0.5411|± |0.0066| |
|
| - international_law | 0|none | 0|acc |↑ | 0.7603|± |0.0390| |
|
| - jurisprudence | 0|none | 0|acc |↑ | 0.7593|± |0.0413| |
|
| - logical_fallacies | 0|none | 0|acc |↑ | 0.7239|± |0.0351| |
|
| - machine_learning | 0|none | 0|acc |↑ | 0.5268|± |0.0474| |
|
| - management | 0|none | 0|acc |↑ | 0.7864|± |0.0406| |
|
| - marketing | 0|none | 0|acc |↑ | 0.8547|± |0.0231| |
|
| - medical_genetics | 0|none | 0|acc |↑ | 0.6500|± |0.0479| |
|
| - miscellaneous | 0|none | 0|acc |↑ | 0.7918|± |0.0145| |
|
| - moral_disputes | 0|none | 0|acc |↑ | 0.6705|± |0.0253| |
|
| - moral_scenarios | 0|none | 0|acc |↑ | 0.2268|± |0.0140| |
|
| - nutrition | 0|none | 0|acc |↑ | 0.6961|± |0.0263| |
|
| - other |N/A |none | 0|acc |↑ | 0.6720|± |0.0081| |
|
| - philosophy | 0|none | 0|acc |↑ | 0.6945|± |0.0262| |
|
| - prehistory | 0|none | 0|acc |↑ | 0.6975|± |0.0256| |
|
| - professional_accounting | 0|none | 0|acc |↑ | 0.4539|± |0.0297| |
|
| - professional_law | 0|none | 0|acc |↑ | 0.4537|± |0.0127| |
|
| - professional_medicine | 0|none | 0|acc |↑ | 0.6176|± |0.0295| |
|
| - professional_psychology | 0|none | 0|acc |↑ | 0.6275|± |0.0196| |
|
| - public_relations | 0|none | 0|acc |↑ | 0.6364|± |0.0461| |
|
| - security_studies | 0|none | 0|acc |↑ | 0.7061|± |0.0292| |
|
| - social_sciences |N/A |none | 0|acc |↑ | 0.7043|± |0.0080| |
|
| - sociology | 0|none | 0|acc |↑ | 0.8458|± |0.0255| |
|
| - stem |N/A |none | 0|acc |↑ | 0.5027|± |0.0086| |
|
| - us_foreign_policy | 0|none | 0|acc |↑ | 0.8400|± |0.0368| |
|
| - virology | 0|none | 0|acc |↑ | 0.5060|± |0.0389| |
|
| - world_religions | 0|none | 0|acc |↑ | 0.8421|± |0.0280| |
|
|openbookqa | 1|none | 0|acc |↑ | 0.3360|± |0.0211| |
|
| | |none | 0|acc_norm |↑ | 0.4380|± |0.0222| |
|
|piqa | 1|none | 0|acc |↑ | 0.8112|± |0.0091| |
|
| | |none | 0|acc_norm |↑ | 0.8194|± |0.0090| |
|
|truthfulqa |N/A |none | 0|acc |↑ | 0.4422|± |0.0113| |
|
| | |none | 0|bleu_acc |↑ | 0.5398|± |0.0174| |
|
| | |none | 0|bleu_diff |↑ | 6.0075|± |0.9539| |
|
| | |none | 0|bleu_max |↑ |30.9946|± |0.8538| |
|
| | |none | 0|rouge1_acc |↑ | 0.5545|± |0.0174| |
|
| | |none | 0|rouge1_diff |↑ | 8.7352|± |1.2500| |
|
| | |none | 0|rouge1_max |↑ |57.5941|± |0.8750| |
|
| | |none | 0|rouge2_acc |↑ | 0.4810|± |0.0175| |
|
| | |none | 0|rouge2_diff |↑ | 7.9063|± |1.3837| |
|
| | |none | 0|rouge2_max |↑ |43.4572|± |1.0786| |
|
| | |none | 0|rougeL_acc |↑ | 0.5239|± |0.0175| |
|
| | |none | 0|rougeL_diff |↑ | 8.3871|± |1.2689| |
|
| | |none | 0|rougeL_max |↑ |54.6542|± |0.9060| |
|
| - truthfulqa_gen | 3|none | 0|bleu_acc |↑ | 0.5398|± |0.0174| |
|
| | |none | 0|bleu_diff |↑ | 6.0075|± |0.9539| |
|
| | |none | 0|bleu_max |↑ |30.9946|± |0.8538| |
|
| | |none | 0|rouge1_acc |↑ | 0.5545|± |0.0174| |
|
| | |none | 0|rouge1_diff |↑ | 8.7352|± |1.2500| |
|
| | |none | 0|rouge1_max |↑ |57.5941|± |0.8750| |
|
| | |none | 0|rouge2_acc |↑ | 0.4810|± |0.0175| |
|
| | |none | 0|rouge2_diff |↑ | 7.9063|± |1.3837| |
|
| | |none | 0|rouge2_max |↑ |43.4572|± |1.0786| |
|
| | |none | 0|rougeL_acc |↑ | 0.5239|± |0.0175| |
|
| | |none | 0|rougeL_diff |↑ | 8.3871|± |1.2689| |
|
| | |none | 0|rougeL_max |↑ |54.6542|± |0.9060| |
|
| - truthfulqa_mc1 | 2|none | 0|acc |↑ | 0.3574|± |0.0168| |
|
| - truthfulqa_mc2 | 2|none | 0|acc |↑ | 0.5269|± |0.0152| |
|
|winogrande | 1|none | 0|acc |↑ | 0.7222|± |0.0126| |
|
|
|
| Groups |Version|Filter|n-shot| Metric | | Value | |Stderr| |
|
|------------------|-------|------|-----:|-----------|---|------:|---|-----:| |
|
|agieval |N/A |none | 0|acc |↑ | 0.3846|± |0.0051| |
|
| | |none | 0|acc_norm |↑ | 0.4186|± |0.0056| |
|
|mmlu |N/A |none | 0|acc |↑ | 0.5972|± |0.0039| |
|
| - humanities |N/A |none | 0|acc |↑ | 0.5411|± |0.0066| |
|
| - other |N/A |none | 0|acc |↑ | 0.6720|± |0.0081| |
|
| - social_sciences|N/A |none | 0|acc |↑ | 0.7043|± |0.0080| |
|
| - stem |N/A |none | 0|acc |↑ | 0.5027|± |0.0086| |
|
|truthfulqa |N/A |none | 0|acc |↑ | 0.4422|± |0.0113| |
|
| | |none | 0|bleu_acc |↑ | 0.5398|± |0.0174| |
|
| | |none | 0|bleu_diff |↑ | 6.0075|± |0.9539| |
|
| | |none | 0|bleu_max |↑ |30.9946|± |0.8538| |
|
| | |none | 0|rouge1_acc |↑ | 0.5545|± |0.0174| |
|
| | |none | 0|rouge1_diff|↑ | 8.7352|± |1.2500| |
|
| | |none | 0|rouge1_max |↑ |57.5941|± |0.8750| |
|
| | |none | 0|rouge2_acc |↑ | 0.4810|± |0.0175| |
|
| | |none | 0|rouge2_diff|↑ | 7.9063|± |1.3837| |
|
| | |none | 0|rouge2_max |↑ |43.4572|± |1.0786| |
|
| | |none | 0|rougeL_acc |↑ | 0.5239|± |0.0175| |
|
| | |none | 0|rougeL_diff|↑ | 8.3871|± |1.2689| |
|
| | |none | 0|rougeL_max |↑ |54.6542|± |0.9060| |