Update README.md
Browse files
README.md
CHANGED
@@ -5,8 +5,42 @@ datasets:
|
|
5 |
library_name: adapter-transformers
|
6 |
---
|
7 |
|
8 |
-
I was just was curious to see what will happen when you use DPO for a merge of non DPO optimized model (OpenHermes) and an already DPO optimized model (neural-chat).
|
9 |
|
10 |
-
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
library_name: adapter-transformers
|
6 |
---
|
7 |
|
|
|
8 |
|
9 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/63ae02ff20176b2d21669dd6/3dKVj-q2MXSc5jfXiVxzC.jpeg" width="500" alt="Description of the image">
|
10 |
|
11 |
+
# Evangelion-7B
|
12 |
+
|
13 |
+
I was just curious to see if something special might happen if one uses:
|
14 |
+
$$
|
15 |
+
\text{{Evangelion}} = \text{{high-quality DPO dataset}} + \text{{merge of DPO optimized model and non-DPO optimized model}}
|
16 |
+
$$
|
17 |
+
|
18 |
+
The underlying model that I used was `/Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp`.
|
19 |
+
|
20 |
+
|
21 |
+
|
22 |
+
|
23 |
+
# Dataset
|
24 |
+
Dataset: `/argilla/distilabel-intel-orca-dpo-pairs`
|
25 |
+
|
26 |
+
The dataset was quality over quantity roughly ~3000 samples but they were high quality (aqccording to the chosen_score).
|
27 |
+
The following filters were applied to the original dataset:
|
28 |
+
```python
|
29 |
+
dataset = dataset.filter(
|
30 |
+
lambda r:
|
31 |
+
r["status"] != "tie" and
|
32 |
+
r["chosen_score"] >= 8 and
|
33 |
+
not r["in_gsm8k_train"]
|
34 |
+
)
|
35 |
+
```
|
36 |
+
|
37 |
+
# Chat Template
|
38 |
+
I decided to go with the ChatML template which I also integrated into this models tokenizer.
|
39 |
+
```
|
40 |
+
<|im_start|>system
|
41 |
+
{system}<|im_end|>
|
42 |
+
<|im_start|>user
|
43 |
+
{user}<|im_end|>
|
44 |
+
<|im_start|>assistant
|
45 |
+
{asistant}<|im_end|>
|
46 |
+
```
|