Files uploaded (1/9)
Browse files
README.md
CHANGED
@@ -1,3 +1,92 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- de
|
5 |
+
library_name: transformers
|
6 |
+
pipeline_tag: text-generation
|
7 |
license: apache-2.0
|
8 |
---
|
9 |
+
|
10 |
+
![image/png](https://huggingface.co/datasets/malteos/images/resolve/main/hermeo.medium.png)
|
11 |
+
|
12 |
+
_Hermes + Leo = Hermeo_
|
13 |
+
|
14 |
+
# Hermeo-7B
|
15 |
+
|
16 |
+
A German-English language model merged from [DPOpenHermes-7B-v2](https://huggingface.co/openaccess-ai-collective/DPOpenHermes-7B-v2) and [leo-mistral-hessianai-7b-chat](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b-chat) using [mergekit](https://github.com/cg123/mergekit).
|
17 |
+
Both base models are fine-tuned versions of [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1).
|
18 |
+
|
19 |
+
|
20 |
+
### Model details
|
21 |
+
|
22 |
+
- **Merged from:** [leo-mistral-hessianai-7b-chat](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b-chat) and [DPOpenHermes-7B-v2](https://huggingface.co/openaccess-ai-collective/DPOpenHermes-7B-v2)
|
23 |
+
- **Model type:** Causal decoder-only transformer language model
|
24 |
+
- **Languages:** English and German
|
25 |
+
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)
|
26 |
+
|
27 |
+
### Acknowledgements
|
28 |
+
|
29 |
+
- This model release is heavily inspired by [Weyaxi/OpenHermes-2.5-neural-chat-v3-2-Slerp](https://huggingface.co/Weyaxi/OpenHermes-2.5-neural-chat-v3-2-Slerp)
|
30 |
+
- Thanks to the authors of the base models: [Mistral](https://mistral.ai/), [LAION](https://laion.ai/), [HessianAI](https://hessian.ai/), [Open Access AI Collective](https://huggingface.co/openaccess-ai-collective), [@teknium](https://huggingface.co/teknium), [@bjoernp](https://huggingface.co/bjoernp)
|
31 |
+
- The [German evaluation datasets and scripts](https://github.com/bjoernpl/lm-evaluation-harness-de/tree/mmlu_de) from [@bjoernp](https://huggingface.co/bjoernp) were used.
|
32 |
+
- The computing resources from [DFKI's PEGASUS cluster](https://pegasus.dfki.de/) were used for the evaluation.
|
33 |
+
|
34 |
+
|
35 |
+
## Evaluation
|
36 |
+
|
37 |
+
The evaluation methdology of the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) is followed.
|
38 |
+
|
39 |
+
### German benchmarks
|
40 |
+
|
41 |
+
| **German tasks:** | **MMLU-DE** | **Hellaswag-DE** | **ARC-DE** |
|
42 |
+
|-------------------------------|-------------|---------------|--------------|
|
43 |
+
| **Models / Few-shots:** | _(5 shots)_ | _(10 shots)_ | _(24 shots)_ |
|
44 |
+
| _7B parameters_ | | | |
|
45 |
+
| llama-2-7b | 0.400 | 0.513 | 0.381 |
|
46 |
+
| leo-hessianai-7b | 0.400 | 0.609 | 0.429 |
|
47 |
+
| bloom-6b4-clp-german | 0.274 | 0.550 | 0.351 |
|
48 |
+
| mistral-7b | **0.524** | 0.588 | 0.473 |
|
49 |
+
| leo-mistral-hessianai-7b | 0.481 | 0.663 | 0.485 |
|
50 |
+
| leo-mistral-hessianai-7b-chat | 0.458 | 0.617 | 0.465 |
|
51 |
+
| DPOpenHermes-7B-v2 | TBA | 0.603 | 0.515 |
|
52 |
+
| hermeo-7b (this model) | 0.511 | **0.668** | **0.528** |
|
53 |
+
| _13B parameters_ | | | |
|
54 |
+
| llama-2-13b | 0.469 | 0.581 | 0.468 |
|
55 |
+
| leo-hessianai-13b | **0.486** | **0.658** | **0.509** |
|
56 |
+
| _70B parameters_ | | | |
|
57 |
+
| llama-2-70b | 0.597 | 0.674 | 0.561 |
|
58 |
+
| leo-hessianai-70b | **0.653** | **0.721** | **0.600** |
|
59 |
+
|
60 |
+
### English benchmarks
|
61 |
+
|
62 |
+
TBA
|
63 |
+
|
64 |
+
## Prompting / Prompt Template
|
65 |
+
|
66 |
+
Prompt dialogue template (ChatML format):
|
67 |
+
|
68 |
+
```
|
69 |
+
"""
|
70 |
+
<|im_start|>system
|
71 |
+
{system_message}<|im_end|>
|
72 |
+
<|im_start|>user
|
73 |
+
{prompt}<|im_end|>
|
74 |
+
<|im_start|>assistant
|
75 |
+
"""
|
76 |
+
```
|
77 |
+
|
78 |
+
The model input can contain multiple conversation turns between user and assistant, e.g.
|
79 |
+
```
|
80 |
+
<|im_start|>user
|
81 |
+
{prompt 1}<|im_end|>
|
82 |
+
<|im_start|>assistant
|
83 |
+
{reply 1}<|im_end|>
|
84 |
+
<|im_start|>user
|
85 |
+
{prompt 2}<|im_end|>
|
86 |
+
<|im_start|>assistant
|
87 |
+
(...)
|
88 |
+
```
|
89 |
+
|
90 |
+
## License
|
91 |
+
|
92 |
+
[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)
|