malteos
/

hermeo-7b

 ---
+language:
+  - en
+  - de
+library_name: transformers
+pipeline_tag: text-generation
 license: apache-2.0
 ---
+![image/png](https://huggingface.co/datasets/malteos/images/resolve/main/hermeo.medium.png)
+_Hermes + Leo = Hermeo_
+# Hermeo-7B
+A German-English language model merged from [DPOpenHermes-7B-v2](https://huggingface.co/openaccess-ai-collective/DPOpenHermes-7B-v2) and [leo-mistral-hessianai-7b-chat](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b-chat) using [mergekit](https://github.com/cg123/mergekit).
+Both base models are fine-tuned versions of [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1).
+### Model details
+- **Merged from:** [leo-mistral-hessianai-7b-chat](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b-chat) and [DPOpenHermes-7B-v2](https://huggingface.co/openaccess-ai-collective/DPOpenHermes-7B-v2)
+- **Model type:** Causal decoder-only transformer language model
+- **Languages:** English and German
+- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)
+### Acknowledgements
+- This model release is heavily inspired by [Weyaxi/OpenHermes-2.5-neural-chat-v3-2-Slerp](https://huggingface.co/Weyaxi/OpenHermes-2.5-neural-chat-v3-2-Slerp)
+- Thanks to the authors of the base models: [Mistral](https://mistral.ai/), [LAION](https://laion.ai/), [HessianAI](https://hessian.ai/), [Open Access AI Collective](https://huggingface.co/openaccess-ai-collective), [@teknium](https://huggingface.co/teknium), [@bjoernp](https://huggingface.co/bjoernp)
+- The [German evaluation datasets and scripts](https://github.com/bjoernpl/lm-evaluation-harness-de/tree/mmlu_de) from [@bjoernp](https://huggingface.co/bjoernp) were used.
+- The computing resources from [DFKI's PEGASUS cluster](https://pegasus.dfki.de/) were used for the evaluation.
+## Evaluation
+The evaluation methdology of the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) is followed.
+### German benchmarks
+| **German tasks:**             | **MMLU-DE**    | **Hellaswag-DE** | **ARC-DE**      |
+|-------------------------------|-------------|---------------|--------------|
+| **Models / Few-shots:**       | _(5 shots)_ | _(10 shots)_  | _(24 shots)_ |
+| _7B parameters_      |  | |  |
+| llama-2-7b                    | 0.400       | 0.513         | 0.381        |
+| leo-hessianai-7b              | 0.400       | 0.609         | 0.429        |
+| bloom-6b4-clp-german          | 0.274       | 0.550         | 0.351        |
+| mistral-7b                    | **0.524**       | 0.588         | 0.473        |
+| leo-mistral-hessianai-7b      | 0.481       | 0.663         | 0.485        |
+| leo-mistral-hessianai-7b-chat | 0.458       | 0.617         | 0.465        |
+| DPOpenHermes-7B-v2            | TBA         | 0.603         | 0.515        |
+| hermeo-7b (this model)        | 0.511       | **0.668**         | **0.528**        |
+| _13B parameters_      |  | |  |
+| llama-2-13b                    | 0.469       | 0.581        | 0.468        |
+| leo-hessianai-13b              | **0.486**       | **0.658**         | **0.509**       |
+| _70B parameters_      |  | |  |
+| llama-2-70b                    | 0.597       | 0.674       | 0.561       |
+| leo-hessianai-70b              | **0.653**       | **0.721**         | **0.600**       |
+### English benchmarks
+TBA
+## Prompting / Prompt Template
+Prompt dialogue template (ChatML format):
+```
+"""
+<|im_start|>system
+{system_message}<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+"""
+```
+The model input can contain multiple conversation turns between user and assistant, e.g.
+```
+<|im_start|>user
+{prompt 1}<|im_end|>
+<|im_start|>assistant
+{reply 1}<|im_end|>
+<|im_start|>user
+{prompt 2}<|im_end|>
+<|im_start|>assistant
+(...)
+```
+## License
+[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)