juvi21
/

Hermes-2.5-Yi-1.5-9B-Chat-GGUF

+---
+license: apache-2.0
+datasets:
+- teknium/OpenHermes-2.5
+tags:
+- axolotl
+- 01-ai/Yi-1.5-9B-Chat
+- finetune
+- gguf
+---
+# Hermes-2.5-Yi-1.5-9B-Chat-GGUF
+This model is a fine-tuned version of [01-ai/Yi-1.5-9B-Chat](https://huggingface.co/01-ai/Yi-1.5-9B-Chat) on the [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) dataset.
+I'm very happy with the results. The model now seems a lot smarter and "aware" in certain situations. It got quite an big edge on the AGIEval Benchmark for models in it's class this is quite good.
+I plan to extend it's context length to 32k with POSE. This is the GGUF repo. You can find the main repo here [Hermes-2.5-Yi-1.5-9B-Chat](https://huggingface.co/juvi21/Hermes-2.5-Yi-1.5-9B-Chat).
+## Model Details
+- **Base Model:** 01-ai/Yi-1.5-9B-Chat
+- **chat-template:** chatml
+- **Dataset:** teknium/OpenHermes-2.5
+- **Sequence Length:** 8192 tokens
+- **Training:**
+- **Epochs:** 1
+- **Hardware:** 4 Nodes x 4 NVIDIA A100 40GB GPUs
+- **Duration:** 48:32:13
+- **Cluster:** KIT SCC Cluster
+## Benchmark n_shots=0
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/659c4ecb413a1376bee2f661/0wv3AMaoete7ysT005n89.png)
+| Benchmark         | Score  |
+|-------------------|--------|
+| ARC (Challenge)   | 52.47% |
+| ARC (Easy)        | 81.65% |
+| BoolQ             | 87.22% |
+| HellaSwag         | 60.52% |
+| OpenBookQA        | 33.60% |
+| PIQA              | 81.12% |
+| Winogrande        | 72.22% |
+| AGIEval           | 38.46% |
+| TruthfulQA        | 44.22% |
+| MMLU              | 59.72% |
+| IFEval            | 47.96% |
+For detailed benchmark results, including sub-categories and various metrics, please refer to the [full benchmark table](#full-benchmark-results) at the end of this README.
+## GGUF and Quantizations
+- llama.cpp [b3166](https://github.com/ggerganov/llama.cpp/releases/tag/b3166)
+- [juvi21/Hermes-2.5-Yi-1.5-9B-Chat-GGUF](https://huggingface.co/juvi21/Hermes-2.5-Yi-1.5-9B-Chat-GGUF) is availabe in:
+- **F16** **Q8_0** **Q6_KQ5_K_M** **Q4_K_M** **Q3_K_M** **Q2_K**
+## Usage
+To use this model, you can load it using the Hugging Face Transformers library:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("juvi21/Hermes-2.5-Yi-1.5-9B-Chat")
+tokenizer = AutoTokenizer.from_pretrained("juvi21/Hermes-2.5-Yi-1.5-9B-Chat")
+# Generate text
+input_text = "What is the question to 42?"
+inputs = tokenizer(input_text, return_tensors="pt")
+outputs = model.generate(**inputs)
+generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(generated_text)
+```
+## chatml
+```
+<|im_start|>system
+{system_prompt}<|im_end|>
+<|im_start|>user
+Knock Knock, who is there?<|im_end|>
+<|im_start|>assistant
+Hi there! <|im_end|>
+```
+## License
+This model is released under the Apache 2.0 license.
+## Acknowledgements
+Special thanks to:
+- Teknium for the great OpenHermes-2.5 dataset
+- 01-ai for their great model
+## Citation
+If you use this model in your research, consider citing. Although definetly cite NousResearch and 01-ai:
+```bibtex
+@misc{
+  author = {juvi21},
+  title = Hermes-2.5-Yi-1.5-9B-Chat},
+  year = {2024},
+}
+```
+## full-benchmark-results
+|                 Tasks                 |Version|Filter|n-shot|        Metric         |   | Value |   |Stderr|
+|---------------------------------------|-------|------|-----:|-----------------------|---|------:|---|------|
+|agieval                                |N/A    |none  |     0|acc                    |↑  | 0.5381|±  |0.0049|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.5715|±  |0.0056|
+| - agieval_aqua_rat                    |      1|none  |     0|acc                    |↑  | 0.3858|±  |0.0306|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.3425|±  |0.0298|
+| - agieval_gaokao_biology              |      1|none  |     0|acc                    |↑  | 0.6048|±  |0.0338|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.6000|±  |0.0339|
+| - agieval_gaokao_chemistry            |      1|none  |     0|acc                    |↑  | 0.4879|±  |0.0348|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.4106|±  |0.0343|
+| - agieval_gaokao_chinese              |      1|none  |     0|acc                    |↑  | 0.5935|±  |0.0314|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.5813|±  |0.0315|
+| - agieval_gaokao_english              |      1|none  |     0|acc                    |↑  | 0.8235|±  |0.0218|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.8431|±  |0.0208|
+| - agieval_gaokao_geography            |      1|none  |     0|acc                    |↑  | 0.7085|±  |0.0323|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.6985|±  |0.0326|
+| - agieval_gaokao_history              |      1|none  |     0|acc                    |↑  | 0.7830|±  |0.0269|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.7660|±  |0.0277|
+| - agieval_gaokao_mathcloze            |      1|none  |     0|acc                    |↑  | 0.0508|±  |0.0203|
+| - agieval_gaokao_mathqa               |      1|none  |     0|acc                    |↑  | 0.3761|±  |0.0259|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.3590|±  |0.0256|
+| - agieval_gaokao_physics              |      1|none  |     0|acc                    |↑  | 0.4950|±  |0.0354|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.4700|±  |0.0354|
+| - agieval_jec_qa_ca                   |      1|none  |     0|acc                    |↑  | 0.6557|±  |0.0150|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.5926|±  |0.0156|
+| - agieval_jec_qa_kd                   |      1|none  |     0|acc                    |↑  | 0.7310|±  |0.0140|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.6610|±  |0.0150|
+| - agieval_logiqa_en                   |      1|none  |     0|acc                    |↑  | 0.5177|±  |0.0196|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.4839|±  |0.0196|
+| - agieval_logiqa_zh                   |      1|none  |     0|acc                    |↑  | 0.4854|±  |0.0196|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.4501|±  |0.0195|
+| - agieval_lsat_ar                     |      1|none  |     0|acc                    |↑  | 0.2913|±  |0.0300|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.2696|±  |0.0293|
+| - agieval_lsat_lr                     |      1|none  |     0|acc                    |↑  | 0.7196|±  |0.0199|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.6824|±  |0.0206|
+| - agieval_lsat_rc                     |      1|none  |     0|acc                    |↑  | 0.7212|±  |0.0274|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.6989|±  |0.0280|
+| - agieval_math                        |      1|none  |     0|acc                    |↑  | 0.0910|±  |0.0091|
+| - agieval_sat_en                      |      1|none  |     0|acc                    |↑  | 0.8204|±  |0.0268|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.8301|±  |0.0262|
+| - agieval_sat_en_without_passage      |      1|none  |     0|acc                    |↑  | 0.5194|±  |0.0349|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.4806|±  |0.0349|
+| - agieval_sat_math                    |      1|none  |     0|acc                    |↑  | 0.5864|±  |0.0333|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.5409|±  |0.0337|
+|arc_challenge                          |      1|none  |     0|acc                    |↑  | 0.5648|±  |0.0145|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.5879|±  |0.0144|
+|arc_easy                               |      1|none  |     0|acc                    |↑  | 0.8241|±  |0.0078|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.8165|±  |0.0079|
+|boolq                                  |      2|none  |     0|acc                    |↑  | 0.8624|±  |0.0060|
+|hellaswag                              |      1|none  |     0|acc                    |↑  | 0.5901|±  |0.0049|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.7767|±  |0.0042|
+|ifeval                                 |      2|none  |     0|inst_level_loose_acc   |↑  | 0.5156|±  |N/A   |
+|                                       |       |none  |     0|inst_level_strict_acc  |↑  | 0.4748|±  |N/A   |
+|                                       |       |none  |     0|prompt_level_loose_acc |↑  | 0.3863|±  |0.0210|
+|                                       |       |none  |     0|prompt_level_strict_acc|↑  | 0.3309|±  |0.0202|
+|mmlu                                   |N/A    |none  |     0|acc                    |↑  | 0.6942|±  |0.0037|
+|  - abstract_algebra                   |      0|none  |     0|acc                    |↑  | 0.4900|±  |0.0502|
+|  - anatomy                            |      0|none  |     0|acc                    |↑  | 0.6815|±  |0.0402|
+|  - astronomy                          |      0|none  |     0|acc                    |↑  | 0.7895|±  |0.0332|
+|  - business_ethics                    |      0|none  |     0|acc                    |↑  | 0.7600|±  |0.0429|
+|  - clinical_knowledge                 |      0|none  |     0|acc                    |↑  | 0.7132|±  |0.0278|
+|  - college_biology                    |      0|none  |     0|acc                    |↑  | 0.8056|±  |0.0331|
+|  - college_chemistry                  |      0|none  |     0|acc                    |↑  | 0.5300|±  |0.0502|
+|  - college_computer_science           |      0|none  |     0|acc                    |↑  | 0.6500|±  |0.0479|
+|  - college_mathematics                |      0|none  |     0|acc                    |↑  | 0.4100|±  |0.0494|
+|  - college_medicine                   |      0|none  |     0|acc                    |↑  | 0.6763|±  |0.0357|
+|  - college_physics                    |      0|none  |     0|acc                    |↑  | 0.5000|±  |0.0498|
+|  - computer_security                  |      0|none  |     0|acc                    |↑  | 0.8200|±  |0.0386|
+|  - conceptual_physics                 |      0|none  |     0|acc                    |↑  | 0.7489|±  |0.0283|
+|  - econometrics                       |      0|none  |     0|acc                    |↑  | 0.5877|±  |0.0463|
+|  - electrical_engineering             |      0|none  |     0|acc                    |↑  | 0.6759|±  |0.0390|
+|  - elementary_mathematics             |      0|none  |     0|acc                    |↑  | 0.6481|±  |0.0246|
+|  - formal_logic                       |      0|none  |     0|acc                    |↑  | 0.5873|±  |0.0440|
+|  - global_facts                       |      0|none  |     0|acc                    |↑  | 0.3900|±  |0.0490|
+|  - high_school_biology                |      0|none  |     0|acc                    |↑  | 0.8613|±  |0.0197|
+|  - high_school_chemistry              |      0|none  |     0|acc                    |↑  | 0.6453|±  |0.0337|
+|  - high_school_computer_science       |      0|none  |     0|acc                    |↑  | 0.8300|±  |0.0378|
+|  - high_school_european_history       |      0|none  |     0|acc                    |↑  | 0.8182|±  |0.0301|
+|  - high_school_geography              |      0|none  |     0|acc                    |↑  | 0.8485|±  |0.0255|
+|  - high_school_government_and_politics|      0|none  |     0|acc                    |↑  | 0.8964|±  |0.0220|
+|  - high_school_macroeconomics         |      0|none  |     0|acc                    |↑  | 0.7923|±  |0.0206|
+|  - high_school_mathematics            |      0|none  |     0|acc                    |↑  | 0.4407|±  |0.0303|
+|  - high_school_microeconomics         |      0|none  |     0|acc                    |↑  | 0.8655|±  |0.0222|
+|  - high_school_physics                |      0|none  |     0|acc                    |↑  | 0.5298|±  |0.0408|
+|  - high_school_psychology             |      0|none  |     0|acc                    |↑  | 0.8679|±  |0.0145|
+|  - high_school_statistics             |      0|none  |     0|acc                    |↑  | 0.6898|±  |0.0315|
+|  - high_school_us_history             |      0|none  |     0|acc                    |↑  | 0.8873|±  |0.0222|
+|  - high_school_world_history          |      0|none  |     0|acc                    |↑  | 0.8312|±  |0.0244|
+|  - human_aging                        |      0|none  |     0|acc                    |↑  | 0.7085|±  |0.0305|
+|  - human_sexuality                    |      0|none  |     0|acc                    |↑  | 0.7557|±  |0.0377|
+| - humanities                          |N/A    |none  |     0|acc                    |↑  | 0.6323|±  |0.0067|
+|  - international_law                  |      0|none  |     0|acc                    |↑  | 0.8099|±  |0.0358|
+|  - jurisprudence                      |      0|none  |     0|acc                    |↑  | 0.7685|±  |0.0408|
+|  - logical_fallacies                  |      0|none  |     0|acc                    |↑  | 0.7975|±  |0.0316|
+|  - machine_learning                   |      0|none  |     0|acc                    |↑  | 0.5179|±  |0.0474|
+|  - management                         |      0|none  |     0|acc                    |↑  | 0.8835|±  |0.0318|
+|  - marketing                          |      0|none  |     0|acc                    |↑  | 0.9017|±  |0.0195|
+|  - medical_genetics                   |      0|none  |     0|acc                    |↑  | 0.8000|±  |0.0402|
+|  - miscellaneous                      |      0|none  |     0|acc                    |↑  | 0.8225|±  |0.0137|
+|  - moral_disputes                     |      0|none  |     0|acc                    |↑  | 0.7283|±  |0.0239|
+|  - moral_scenarios                    |      0|none  |     0|acc                    |↑  | 0.4860|±  |0.0167|
+|  - nutrition                          |      0|none  |     0|acc                    |↑  | 0.7353|±  |0.0253|
+| - other                               |N/A    |none  |     0|acc                    |↑  | 0.7287|±  |0.0077|
+|  - philosophy                         |      0|none  |     0|acc                    |↑  | 0.7170|±  |0.0256|
+|  - prehistory                         |      0|none  |     0|acc                    |↑  | 0.7346|±  |0.0246|
+|  - professional_accounting            |      0|none  |     0|acc                    |↑  | 0.5638|±  |0.0296|
+|  - professional_law                   |      0|none  |     0|acc                    |↑  | 0.5163|±  |0.0128|
+|  - professional_medicine              |      0|none  |     0|acc                    |↑  | 0.6875|±  |0.0282|
+|  - professional_psychology            |      0|none  |     0|acc                    |↑  | 0.7092|±  |0.0184|
+|  - public_relations                   |      0|none  |     0|acc                    |↑  | 0.6727|±  |0.0449|
+|  - security_studies                   |      0|none  |     0|acc                    |↑  | 0.7347|±  |0.0283|
+| - social_sciences                     |N/A    |none  |     0|acc                    |↑  | 0.7910|±  |0.0072|
+|  - sociology                          |      0|none  |     0|acc                    |↑  | 0.8060|±  |0.0280|
+| - stem                                |N/A    |none  |     0|acc                    |↑  | 0.6581|±  |0.0081|
+|  - us_foreign_policy                  |      0|none  |     0|acc                    |↑  | 0.8900|±  |0.0314|
+|  - virology                           |      0|none  |     0|acc                    |↑  | 0.5301|±  |0.0389|
+|  - world_religions                    |      0|none  |     0|acc                    |↑  | 0.8012|±  |0.0306|
+|openbookqa                             |      1|none  |     0|acc                    |↑  | 0.3280|±  |0.0210|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.4360|±  |0.0222|
+|piqa                                   |      1|none  |     0|acc                    |↑  | 0.7982|±  |0.0094|
+|                                       |       |none  |     0|acc_norm               |↑  | 0.8074|±  |0.0092|
+|truthfulqa                             |N/A    |none  |     0|acc                    |↑  | 0.4746|±  |0.0116|
+|                                       |       |none  |     0|bleu_acc               |↑  | 0.4700|±  |0.0175|
+|                                       |       |none  |     0|bleu_diff              |↑  | 0.3214|±  |0.6045|
+|                                       |       |none  |     0|bleu_max               |↑  |22.5895|±  |0.7122|
+|                                       |       |none  |     0|rouge1_acc             |↑  | 0.4798|±  |0.0175|
+|                                       |       |none  |     0|rouge1_diff            |↑  | 0.0846|±  |0.7161|
+|                                       |       |none  |     0|rouge1_max             |↑  |48.7180|±  |0.7833|
+|                                       |       |none  |     0|rouge2_acc             |↑  | 0.4149|±  |0.0172|
+|                                       |       |none  |     0|rouge2_diff            |↑  |-0.4656|±  |0.8375|
+|                                       |       |none  |     0|rouge2_max             |↑  |34.0585|±  |0.8974|
+|                                       |       |none  |     0|rougeL_acc             |↑  | 0.4651|±  |0.0175|
+|                                       |       |none  |     0|rougeL_diff            |↑  |-0.2804|±  |0.7217|
+|                                       |       |none  |     0|rougeL_max             |↑  |45.2232|±  |0.7971|
+| - truthfulqa_gen                      |      3|none  |     0|bleu_acc               |↑  | 0.4700|±  |0.0175|
+|                                       |       |none  |     0|bleu_diff              |↑  | 0.3214|±  |0.6045|
+|                                       |       |none  |     0|bleu_max               |↑  |22.5895|±  |0.7122|
+|                                       |       |none  |     0|rouge1_acc             |↑  | 0.4798|±  |0.0175|
+|                                       |       |none  |     0|rouge1_diff            |↑  | 0.0846|±  |0.7161|
+|                                       |       |none  |     0|rouge1_max             |↑  |48.7180|±  |0.7833|
+|                                       |       |none  |     0|rouge2_acc             |↑  | 0.4149|±  |0.0172|
+|                                       |       |none  |     0|rouge2_diff            |↑  |-0.4656|±  |0.8375|
+|                                       |       |none  |     0|rouge2_max             |↑  |34.0585|±  |0.8974|
+|                                       |       |none  |     0|rougeL_acc             |↑  | 0.4651|±  |0.0175|
+|                                       |       |none  |     0|rougeL_diff            |↑  |-0.2804|±  |0.7217|
+|                                       |       |none  |     0|rougeL_max             |↑  |45.2232|±  |0.7971|
+| - truthfulqa_mc1                      |      2|none  |     0|acc                    |↑  | 0.3905|±  |0.0171|
+| - truthfulqa_mc2                      |      2|none  |     0|acc                    |↑  | 0.5587|±  |0.0156|
+|winogrande                             |      1|none  |     0|acc                    |↑  | 0.7388|±  |0.0123|
+|      Groups      |Version|Filter|n-shot|  Metric   |   | Value |   |Stderr|
+|------------------|-------|------|-----:|-----------|---|------:|---|-----:|
+|agieval           |N/A    |none  |     0|acc        |↑  | 0.5381|±  |0.0049|
+|                  |       |none  |     0|acc_norm   |↑  | 0.5715|±  |0.0056|
+|mmlu              |N/A    |none  |     0|acc        |↑  | 0.6942|±  |0.0037|
+| - humanities     |N/A    |none  |     0|acc        |↑  | 0.6323|±  |0.0067|
+| - other          |N/A    |none  |     0|acc        |↑  | 0.7287|±  |0.0077|
+| - social_sciences|N/A    |none  |     0|acc        |↑  | 0.7910|±  |0.0072|
+| - stem           |N/A    |none  |     0|acc        |↑  | 0.6581|±  |0.0081|
+|truthfulqa        |N/A    |none  |     0|acc        |↑  | 0.4746|±  |0.0116|
+|                  |       |none  |     0|bleu_acc   |↑  | 0.4700|±  |0.0175|
+|                  |       |none  |     0|bleu_diff  |↑  | 0.3214|±  |0.6045|
+|                  |       |none  |     0|bleu_max   |↑  |22.5895|±  |0.7122|
+|                  |       |none  |     0|rouge1_acc |↑  | 0.4798|±  |0.0175|
+|                  |       |none  |     0|rouge1_diff|↑  | 0.0846|±  |0.7161|
+|                  |       |none  |     0|rouge1_max |↑  |48.7180|±  |0.7833|
+|                  |       |none  |     0|rouge2_acc |↑  | 0.4149|±  |0.0172|
+|                  |       |none  |     0|rouge2_diff|↑  |-0.4656|±  |0.8375|
+|                  |       |none  |     0|rouge2_max |↑  |34.0585|±  |0.8974|
+|                  |       |none  |     0|rougeL_acc |↑  | 0.4651|±  |0.0175|
+|                  |       |none  |     0|rougeL_diff|↑  |-0.2804|±  |0.7217|
+|                  |       |none  |     0|rougeL_max |↑  |45.2232|±  |0.7971|