File size: 948 Bytes
bbc9f45
 
79d8e59
 
 
 
bbc9f45
8fa567d
26b3272
 
5d693b4
bbc9f45
5d693b4
bbc9f45
5d693b4
bbc9f45
5d693b4
bbc9f45
5d693b4
bbc9f45
5d693b4
bbc9f45
5bbf297
bbc9f45
5d693b4
bbc9f45
5d693b4
8fa567d
bbc9f45
 
5bbf297
8fa567d
5bbf297
8fa567d
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
*This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.*

**Benchmarks**

TBA

**Training Details**

Model: https://huggingface.co/unsloth/mistral-7b-v0.2-bnb-4bit

Dataset: https://huggingface.co/datasets/argilla/dpo-mix-7k 

Rank: 8 

Alpha: 16 

Learning rate: 5e-6 

Beta: 0.1 

Batch size: 8 

Epochs: 1

Learning rate scheduler: Linear

Prompt Format: ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>```


**WanDB Reports**
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/8DQ0WiypkVIJeK_Y18Wv0.png)

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)