File size: 9,225 Bytes
6615799
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
license: apache-2.0
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: mistralai/Mistral-7B-v0.1
model-index:
- name: zephyr-7b-dpo-qlora
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-qlora

This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4873
- Rewards/chosen: -2.9667
- Rewards/rejected: -4.1000
- Rewards/accuracies: 0.7445
- Rewards/margins: 1.1333
- Logps/rejected: -654.6072
- Logps/chosen: -561.3217
- Logits/rejected: -0.9450
- Logits/chosen: -1.0724

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
|:-------------:|:-----:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:|
| 0.6819        | 0.03  | 100  | -2.0959       | -1.9565         | -259.6472    | -241.9029      | 0.6822          | 0.6545             | 0.0500         | 0.0230          | 0.0271           |
| 0.6548        | 0.05  | 200  | 0.6500        | -0.1489         | -0.2515      | 0.6780         | 0.1027          | -269.7628          | -279.5373      | -1.9329         | -2.0695          |
| 0.6084        | 0.08  | 300  | 0.6213        | -0.2956         | -0.4998      | 0.6810         | 0.2042          | -294.5921          | -294.2169      | -1.8771         | -2.0114          |
| 0.6237        | 0.1   | 400  | 0.6039        | -0.4538         | -0.7401      | 0.6935         | 0.2863          | -318.6170          | -310.0349      | -1.8367         | -1.9656          |
| 0.5534        | 0.13  | 500  | 0.5692        | -0.9154         | -1.3927      | 0.7050         | 0.4773          | -383.8828          | -356.1946      | -1.5403         | -1.6712          |
| 0.5613        | 0.16  | 600  | 0.5659        | -0.8123         | -1.3218      | 0.7025         | 0.5095          | -376.7896          | -345.8830      | -1.3701         | -1.5049          |
| 0.5139        | 0.18  | 700  | 0.5572        | -2.6368         | -3.4670      | 0.7145         | 0.8302          | -591.3087          | -528.3278      | -0.8924         | -1.0174          |
| 0.5184        | 0.21  | 800  | 0.5374        | -1.4908         | -2.1870      | 0.7160         | 0.6962          | -463.3091          | -413.7339      | -1.1141         | -1.2460          |
| 0.5211        | 0.24  | 900  | 0.5332        | -2.5430         | -3.3947      | 0.7180         | 0.8518          | -584.0806          | -518.9495      | -0.8116         | -0.9341          |
| 0.5553        | 0.26  | 1000 | 0.5178        | -2.1745         | -3.0424      | 0.7315         | 0.8679          | -548.8491          | -482.0993      | -0.8557         | -0.9813          |
| 0.5994        | 0.29  | 1100 | 0.5207        | -2.5002         | -3.3276      | 0.7300         | 0.8275          | -577.3698          | -514.6677      | -0.7615         | -0.8896          |
| 0.5976        | 0.31  | 1200 | 0.5098        | -2.1833         | -2.9905      | 0.7365         | 0.8072          | -543.6604          | -482.9834      | -0.8350         | -0.9596          |
| 0.5237        | 0.34  | 1300 | 0.5166        | -3.0973         | -4.1628      | 0.7350         | 1.0654          | -660.8850          | -574.3862      | -0.7072         | -0.8259          |
| 0.516         | 0.37  | 1400 | 0.5108        | -2.1009         | -3.0663      | 0.7350         | 0.9654          | -551.2367          | -474.7425      | -0.7865         | -0.9128          |
| 0.4593        | 0.39  | 1500 | 0.5174        | -2.3167         | -3.4254      | 0.7305         | 1.1088          | -587.1506          | -496.3185      | -0.8903         | -1.0211          |
| 0.5545        | 0.42  | 1600 | 0.5032        | -2.9938         | -4.0820      | 0.7370         | 1.0882          | -652.8123          | -564.0355      | -0.8801         | -1.0082          |
| 0.5425        | 0.44  | 1700 | 0.4996        | -3.3496         | -4.4061      | 0.7405         | 1.0565          | -685.2187          | -599.6096      | -0.8382         | -0.9686          |
| 0.4825        | 0.47  | 1800 | 0.5037        | -3.0446         | -4.1288      | 0.7380         | 1.0842          | -657.4884          | -569.1091      | -0.8738         | -1.0006          |
| 0.4455        | 0.5   | 1900 | 0.4962        | -3.0223         | -4.1482      | 0.7420         | 1.1259          | -659.4305          | -566.8840      | -0.8910         | -1.0214          |
| 0.4817        | 0.52  | 2000 | 0.4974        | -3.5987         | -4.6648      | 0.7470         | 1.0660          | -711.0853          | -624.5250      | -0.8139         | -0.9428          |
| 0.5079        | 0.55  | 2100 | 0.4923        | -3.1751         | -4.2293      | 0.7520         | 1.0542          | -667.5426          | -582.1657      | -0.8739         | -1.0031          |
| 0.477         | 0.58  | 2200 | 0.4897        | -2.6127         | -3.5713      | 0.7410         | 0.9587          | -601.7402          | -525.9182      | -0.9567         | -1.0880          |
| 0.4829        | 0.6   | 2300 | 0.4887        | -2.9530         | -4.0954      | 0.7485         | 1.1424          | -654.1511          | -559.9558      | -0.9032         | -1.0313          |
| 0.4752        | 0.63  | 2400 | 0.4909        | -3.1480         | -4.2815      | 0.7445         | 1.1335          | -672.7583          | -579.4506      | -0.8495         | -0.9765          |
| 0.5249        | 0.65  | 2500 | 0.4891        | -3.0936         | -4.2029      | 0.7445         | 1.1093          | -664.8962          | -574.0093      | -0.9136         | -1.0435          |
| 0.4596        | 0.68  | 2600 | 0.4939        | -2.9492         | -4.0985      | 0.7400         | 1.1493          | -654.4570          | -559.5698      | -0.9264         | -1.0549          |
| 0.5152        | 0.71  | 2700 | 0.4922        | -3.0197         | -4.1572      | 0.7440         | 1.1375          | -660.3236          | -566.6193      | -0.9249         | -1.0527          |
| 0.4518        | 0.73  | 2800 | 0.4908        | -3.0666         | -4.2342      | 0.7415         | 1.1676          | -668.0294          | -571.3138      | -0.9260         | -1.0535          |
| 0.5018        | 0.76  | 2900 | 0.4877        | -3.0977         | -4.2382      | 0.7465         | 1.1405          | -668.4285          | -574.4260      | -0.9320         | -1.0595          |
| 0.4592        | 0.79  | 3000 | 0.4873        | -2.9934         | -4.1134      | 0.7460         | 1.1200          | -655.9471          | -563.9877      | -0.9510         | -1.0788          |
| 0.4905        | 0.81  | 3100 | 0.4878        | -2.9825         | -4.1198      | 0.7430         | 1.1373          | -656.5853          | -562.9043      | -0.9465         | -1.0741          |
| 0.485         | 0.84  | 3200 | 0.4874        | -2.9459         | -4.0754      | 0.7455         | 1.1296          | -652.1517          | -559.2400      | -0.9531         | -1.0807          |
| 0.5157        | 0.86  | 3300 | 0.4874        | -2.9550         | -4.0838      | 0.7445         | 1.1289          | -652.9912          | -560.1489      | -0.9481         | -1.0755          |
| 0.4474        | 0.89  | 3400 | 0.4871        | -2.9699         | -4.1019      | 0.7435         | 1.1321          | -654.8017          | -561.6381      | -0.9499         | -1.0773          |
| 0.5379        | 0.92  | 3500 | 0.4874        | -2.9663         | -4.0989      | 0.7430         | 1.1326          | -654.5006          | -561.2808      | -0.9468         | -1.0742          |
| 0.464         | 0.94  | 3600 | 0.4874        | -2.9638         | -4.0967      | 0.7425         | 1.1329          | -654.2791          | -561.0286      | -0.9475         | -1.0748          |
| 0.4729        | 0.97  | 3700 | 0.4873        | -2.9666         | -4.0999      | 0.7445         | 1.1333          | -654.6014          | -561.3129      | -0.9495         | -1.0770          |
| 0.5017        | 0.99  | 3800 | 0.4873        | -2.9667         | -4.1000      | 0.7445         | 1.1333          | -654.6072          | -561.3217      | -0.9450         | -1.0724          |


### Framework versions

- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.2.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2