File size: 1,237 Bytes
f6b3b80 bbc9f45 342f4da e7c2c4c 79d8e59 ac33a25 9676f37 ac33a25 9676f37 ac33a25 9676f37 ac33a25 9676f37 ac33a25 9676f37 f1f55b9 9676f37 ac33a25 79d8e59 bbc9f45 8fa567d 827a381 26b3272 5d693b4 bbc9f45 5d693b4 bbc9f45 5d693b4 bbc9f45 5d693b4 bbc9f45 5d693b4 bbc9f45 5d693b4 bbc9f45 5bbf297 bbc9f45 5d693b4 bbc9f45 c67b1a3 8fa567d bbc9f45 5bbf297 8fa567d 5bbf297 8fa567d f6b3b80 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
---
license: "apache-2.0"
---
*This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.*
Note: Completely broken. Do not use.
**Benchmarks**
Average 59.52
ARC 59.47
HellaSwag 82.42
MMLU 62.21
TruthfulQA 40.01
Winogrande 78.3
GSM8K 34.72
**Training Details**
Duration: ~10-12 hours on one Kaggle T4 with Unsloth
Model: https://huggingface.co/unsloth/mistral-7b-v0.2-bnb-4bit
Dataset: https://huggingface.co/datasets/argilla/dpo-mix-7k
Rank: 8
Alpha: 16
Learning rate: 5e-6
Beta: 0.1
Batch size: 8
Epochs: 1
Learning rate scheduler: Linear
Prompt Format: ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>``` (The start token \<s\> must be added manually and not automatically)
**WanDB Reports**
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/8DQ0WiypkVIJeK_Y18Wv0.png)
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |