File size: 14,330 Bytes
ecc301f
 
1e12881
 
 
ecc301f
 
 
1e12881
ecc301f
 
 
 
 
 
 
 
 
 
 
 
 
1e12881
ecc301f
1e12881
 
ecc301f
 
1e12881
 
 
 
 
ecc301f
1e12881
 
 
 
ecc301f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
- generation/UF
- generation/UFfull2
library_name: peft
license: apache-2.0
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: zephyr-dpop-qlora-uf-ours-uffull-5e-7
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-dpop-qlora-uf-ours-uffull-5e-7

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the generation/UF and the generation/UFfull2 datasets.
It achieves the following results on the evaluation set:
- Loss: 0.6824
- Positive Losses: 0.1476
- Dpo Losses: 0.6646
- Rewards/chosen: 0.1662
- Rewards/rejected: 0.1035
- Rewards/accuracies: 0.6815
- Rewards/margins: 0.0627
- Rewards/margins Max: 0.2718
- Rewards/margins Min: -0.1172
- Rewards/margins Std: 0.1305
- Logps/rejected: -255.5023
- Logps/chosen: -267.8385
- Logits/rejected: -2.7203
- Logits/chosen: -2.7554

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:---------------:|:----------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.694         | 0.02  | 100  | 0.6937          | 0.0064          | 0.6931     | 0.0049         | 0.0049           | 0.5075             | 0.0001          | 0.0049              | -0.0046             | 0.0032              | -265.3661      | -283.9625    | -2.7648         | -2.8001       |
| 0.6922        | 0.05  | 200  | 0.6930          | 0.0035          | 0.6926     | 0.0082         | 0.0071           | 0.5875             | 0.0011          | 0.0082              | -0.0056             | 0.0046              | -265.1425      | -283.6357    | -2.7650         | -2.8002       |
| 0.692         | 0.07  | 300  | 0.6921          | 0.0052          | 0.6914     | 0.0190         | 0.0154           | 0.6175             | 0.0035          | 0.0195              | -0.0103             | 0.0099              | -264.3096      | -282.5598    | -2.7662         | -2.8012       |
| 0.6914        | 0.1   | 400  | 0.6907          | 0.0081          | 0.6896     | 0.0324         | 0.0252           | 0.6435             | 0.0072          | 0.0364              | -0.0176             | 0.0181              | -263.3349      | -281.2179    | -2.7620         | -2.7972       |
| 0.6867        | 0.12  | 500  | 0.6887          | 0.0124          | 0.6868     | 0.0581         | 0.0451           | 0.6360             | 0.0130          | 0.0654              | -0.0313             | 0.0323              | -261.3455      | -278.6435    | -2.7580         | -2.7932       |
| 0.6903        | 0.14  | 600  | 0.6869          | 0.0213          | 0.6837     | 0.0696         | 0.0499           | 0.6565             | 0.0197          | 0.0949              | -0.0434             | 0.0461              | -260.8595      | -277.4952    | -2.7576         | -2.7926       |
| 0.6828        | 0.17  | 700  | 0.6855          | 0.0302          | 0.6813     | 0.0840         | 0.0592           | 0.6595             | 0.0248          | 0.1199              | -0.0539             | 0.0580              | -259.9324      | -276.0511    | -2.7490         | -2.7843       |
| 0.6758        | 0.19  | 800  | 0.6855          | 0.0526          | 0.6791     | 0.0969         | 0.0672           | 0.6550             | 0.0297          | 0.1423              | -0.0640             | 0.0688              | -259.1296      | -274.7613    | -2.7450         | -2.7804       |
| 0.6811        | 0.22  | 900  | 0.6854          | 0.0594          | 0.6771     | 0.1064         | 0.0725           | 0.6645             | 0.0339          | 0.1596              | -0.0715             | 0.0771              | -258.6040      | -273.8141    | -2.7378         | -2.7726       |
| 0.6803        | 0.24  | 1000 | 0.6845          | 0.0609          | 0.6762     | 0.1167         | 0.0807           | 0.6645             | 0.0360          | 0.1687              | -0.0763             | 0.0818              | -257.7856      | -272.7885    | -2.7285         | -2.7634       |
| 0.6759        | 0.26  | 1100 | 0.6842          | 0.0676          | 0.6750     | 0.1250         | 0.0862           | 0.6610             | 0.0388          | 0.1815              | -0.0829             | 0.0881              | -257.2345      | -271.9526    | -2.7320         | -2.7672       |
| 0.6732        | 0.29  | 1200 | 0.6896          | 0.1405          | 0.6722     | 0.1179         | 0.0727           | 0.6695             | 0.0452          | 0.2076              | -0.0939             | 0.1005              | -258.5845      | -272.6641    | -2.7315         | -2.7664       |
| 0.6748        | 0.31  | 1300 | 0.6835          | 0.0876          | 0.6734     | 0.1391         | 0.0966           | 0.6665             | 0.0425          | 0.1965              | -0.0897             | 0.0954              | -256.1944      | -270.5492    | -2.7357         | -2.7709       |
| 0.6872        | 0.34  | 1400 | 0.6834          | 0.0973          | 0.6721     | 0.1392         | 0.0939           | 0.6670             | 0.0453          | 0.2070              | -0.0930             | 0.1000              | -256.4647      | -270.5385    | -2.7367         | -2.7719       |
| 0.6926        | 0.36  | 1500 | 0.6833          | 0.1058          | 0.6710     | 0.1402         | 0.0925           | 0.6685             | 0.0477          | 0.2165              | -0.0956             | 0.1042              | -256.6026      | -270.4324    | -2.7329         | -2.7681       |
| 0.6862        | 0.38  | 1600 | 0.6891          | 0.1729          | 0.6689     | 0.1322         | 0.0796           | 0.6750             | 0.0526          | 0.2361              | -0.1039             | 0.1134              | -257.8935      | -271.2309    | -2.7292         | -2.7642       |
| 0.6779        | 0.41  | 1700 | 0.6821          | 0.0962          | 0.6698     | 0.1486         | 0.0979           | 0.6705             | 0.0507          | 0.2293              | -0.1016             | 0.1104              | -256.0604      | -269.5961    | -2.7308         | -2.7658       |
| 0.6726        | 0.43  | 1800 | 0.6842          | 0.1209          | 0.6687     | 0.1467         | 0.0934           | 0.6730             | 0.0533          | 0.2380              | -0.1060             | 0.1149              | -256.5087      | -269.7857    | -2.7266         | -2.7615       |
| 0.6688        | 0.45  | 1900 | 0.6834          | 0.1202          | 0.6681     | 0.1483         | 0.0938           | 0.6745             | 0.0545          | 0.2410              | -0.1065             | 0.1162              | -256.4724      | -269.6281    | -2.7300         | -2.7651       |
| 0.6616        | 0.48  | 2000 | 0.6818          | 0.1092          | 0.6681     | 0.1532         | 0.0987           | 0.6720             | 0.0545          | 0.2409              | -0.1069             | 0.1164              | -255.9825      | -269.1367    | -2.7336         | -2.7687       |
| 0.6707        | 0.5   | 2100 | 0.6804          | 0.0930          | 0.6684     | 0.1588         | 0.1049           | 0.6710             | 0.0538          | 0.2405              | -0.1069             | 0.1162              | -255.3586      | -268.5765    | -2.7300         | -2.7651       |
| 0.6796        | 0.53  | 2200 | 0.6849          | 0.1551          | 0.6666     | 0.1500         | 0.0920           | 0.6755             | 0.0580          | 0.2565              | -0.1121             | 0.1234              | -256.6537      | -269.4551    | -2.7228         | -2.7582       |
| 0.6672        | 0.55  | 2300 | 0.6830          | 0.1404          | 0.6668     | 0.1562         | 0.0986           | 0.6725             | 0.0576          | 0.2557              | -0.1114             | 0.1231              | -255.9975      | -268.8366    | -2.7203         | -2.7554       |
| 0.6769        | 0.57  | 2400 | 0.6819          | 0.1252          | 0.6668     | 0.1596         | 0.1019           | 0.6740             | 0.0577          | 0.2565              | -0.1128             | 0.1238              | -255.6599      | -268.4941    | -2.7159         | -2.7508       |
| 0.6725        | 0.6   | 2500 | 0.6903          | 0.2239          | 0.6645     | 0.1488         | 0.0859           | 0.6850             | 0.0630          | 0.2751              | -0.1201             | 0.1325              | -257.2663      | -269.5727    | -2.7161         | -2.7509       |
| 0.6762        | 0.62  | 2600 | 0.6834          | 0.1472          | 0.6655     | 0.1615         | 0.1008           | 0.6760             | 0.0606          | 0.2671              | -0.1166             | 0.1287              | -255.7709      | -268.3081    | -2.7154         | -2.7503       |
| 0.6867        | 0.65  | 2700 | 0.6846          | 0.1619          | 0.6649     | 0.1605         | 0.0985           | 0.6820             | 0.0620          | 0.2708              | -0.1178             | 0.1304              | -256.0078      | -268.4086    | -2.7205         | -2.7554       |
| 0.702         | 0.67  | 2800 | 0.6836          | 0.1510          | 0.6651     | 0.1623         | 0.1007           | 0.6815             | 0.0616          | 0.2697              | -0.1175             | 0.1299              | -255.7832      | -268.2218    | -2.7157         | -2.7510       |
| 0.6822        | 0.69  | 2900 | 0.6818          | 0.1312          | 0.6653     | 0.1655         | 0.1045           | 0.6800             | 0.0610          | 0.2669              | -0.1156             | 0.1282              | -255.4075      | -267.9095    | -2.7201         | -2.7554       |
| 0.6751        | 0.72  | 3000 | 0.6809          | 0.1235          | 0.6656     | 0.1674         | 0.1070           | 0.6745             | 0.0604          | 0.2651              | -0.1144             | 0.1272              | -255.1547      | -267.7156    | -2.7193         | -2.7547       |
| 0.673         | 0.74  | 3100 | 0.6830          | 0.1523          | 0.6648     | 0.1643         | 0.1022           | 0.6815             | 0.0621          | 0.2709              | -0.1168             | 0.1301              | -255.6314      | -268.0210    | -2.7211         | -2.7563       |
| 0.6666        | 0.77  | 3200 | 0.6818          | 0.1381          | 0.6653     | 0.1672         | 0.1062           | 0.6785             | 0.0611          | 0.2675              | -0.1157             | 0.1284              | -255.2344      | -267.7304    | -2.7202         | -2.7554       |
| 0.6619        | 0.79  | 3300 | 0.6829          | 0.1523          | 0.6647     | 0.1652         | 0.1028           | 0.6810             | 0.0624          | 0.2717              | -0.1172             | 0.1304              | -255.5768      | -267.9396    | -2.7207         | -2.7559       |
| 0.6752        | 0.81  | 3400 | 0.6830          | 0.1530          | 0.6647     | 0.1653         | 0.1029           | 0.6805             | 0.0625          | 0.2718              | -0.1177             | 0.1306              | -255.5670      | -267.9222    | -2.7197         | -2.7548       |
| 0.6711        | 0.84  | 3500 | 0.6841          | 0.1663          | 0.6643     | 0.1634         | 0.1000           | 0.6795             | 0.0633          | 0.2740              | -0.1183             | 0.1317              | -255.8493      | -268.1196    | -2.7188         | -2.7540       |
| 0.669         | 0.86  | 3600 | 0.6843          | 0.1689          | 0.6642     | 0.1628         | 0.0992           | 0.6815             | 0.0637          | 0.2755              | -0.1190             | 0.1323              | -255.9366      | -268.1706    | -2.7180         | -2.7533       |
| 0.6563        | 0.89  | 3700 | 0.6835          | 0.1602          | 0.6643     | 0.1642         | 0.1009           | 0.6815             | 0.0633          | 0.2740              | -0.1182             | 0.1316              | -255.7627      | -268.0358    | -2.7189         | -2.7540       |
| 0.6811        | 0.91  | 3800 | 0.6828          | 0.1517          | 0.6646     | 0.1658         | 0.1032           | 0.6820             | 0.0627          | 0.2721              | -0.1176             | 0.1307              | -255.5359      | -267.8722    | -2.7190         | -2.7541       |
| 0.664         | 0.93  | 3900 | 0.6823          | 0.1453          | 0.6647     | 0.1664         | 0.1039           | 0.6780             | 0.0625          | 0.2717              | -0.1171             | 0.1305              | -255.4641      | -267.8119    | -2.7221         | -2.7571       |
| 0.6771        | 0.96  | 4000 | 0.6824          | 0.1453          | 0.6647     | 0.1662         | 0.1037           | 0.6775             | 0.0625          | 0.2716              | -0.1174             | 0.1304              | -255.4852      | -267.8388    | -2.7216         | -2.7566       |
| 0.6644        | 0.98  | 4100 | 0.6825          | 0.1480          | 0.6646     | 0.1662         | 0.1036           | 0.6810             | 0.0626          | 0.2720              | -0.1174             | 0.1305              | -255.4913      | -267.8348    | -2.7189         | -2.7542       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2