File size: 7,541 Bytes
0891d87
 
 
 
 
 
 
 
2042ee7
 
0891d87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2042ee7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
license: apache-2.0
base_model: glimmerz/zephyr-7b-sft-full
tags:
- generated_from_trainer
model-index:
- name: zephyr-7b-dpo-full
  results: []
datasets:
- HuggingFaceH4/ultrafeedback_binarized
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-full

This model is a fine-tuned version of [glimmerz/zephyr-7b-sft-full](https://huggingface.co/glimmerz/zephyr-7b-sft-full) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.7385
- Rewards/chosen: -4.7566
- Rewards/rejected: -8.6166
- Rewards/accuracies: 0.7560
- Rewards/margins: 3.8601
- Logps/rejected: -315.8341
- Logps/chosen: -321.4129
- Logits/rejected: -2.2590
- Logits/chosen: -2.3620

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.575         | 0.1   | 100  | 0.5309          | -0.0101        | -0.6034          | 0.7460             | 0.5933          | -235.7018      | -273.9487    | -2.6525         | -2.7458       |
| 0.4759        | 0.21  | 200  | 0.4943          | -0.0642        | -1.0829          | 0.75               | 1.0187          | -240.4966      | -274.4892    | -2.7066         | -2.8006       |
| 0.5022        | 0.31  | 300  | 0.4824          | -0.1526        | -1.2517          | 0.7620             | 1.0991          | -242.1845      | -275.3735    | -2.7362         | -2.8225       |
| 0.5282        | 0.41  | 400  | 0.4878          | -0.6794        | -1.9420          | 0.7840             | 1.2626          | -249.0876      | -280.6413    | -2.7023         | -2.7924       |
| 0.5179        | 0.52  | 500  | 0.4805          | -0.2645        | -1.4485          | 0.7760             | 1.1841          | -244.1532      | -276.4918    | -2.6773         | -2.7631       |
| 0.4705        | 0.62  | 600  | 0.4715          | -0.3016        | -1.5766          | 0.7560             | 1.2750          | -245.4337      | -276.8629    | -2.7009         | -2.7838       |
| 0.5038        | 0.72  | 700  | 0.4790          | -0.3119        | -1.5731          | 0.7680             | 1.2612          | -245.3986      | -276.9666    | -2.5409         | -2.6269       |
| 0.4418        | 0.83  | 800  | 0.4665          | -0.4564        | -2.0177          | 0.7800             | 1.5612          | -249.8442      | -278.4113    | -2.4834         | -2.5636       |
| 0.5155        | 0.93  | 900  | 0.4770          | -0.3715        | -1.7079          | 0.7740             | 1.3364          | -246.7468      | -277.5622    | -2.5118         | -2.5927       |
| 0.3463        | 1.03  | 1000 | 0.4755          | -0.5305        | -1.8263          | 0.7680             | 1.2958          | -247.9306      | -279.1520    | -2.6282         | -2.7083       |
| 0.1266        | 1.14  | 1100 | 0.4924          | -1.0131        | -2.8651          | 0.7740             | 1.8519          | -258.3182      | -283.9783    | -2.5584         | -2.6430       |
| 0.0751        | 1.24  | 1200 | 0.5208          | -1.4508        | -3.6646          | 0.7760             | 2.2138          | -266.3139      | -288.3549    | -2.5574         | -2.6450       |
| 0.0306        | 1.34  | 1300 | 0.5779          | -2.1463        | -4.7450          | 0.7580             | 2.5987          | -277.1172      | -295.3102    | -2.4957         | -2.5865       |
| 0.031         | 1.45  | 1400 | 0.5993          | -2.6730        | -5.3111          | 0.7580             | 2.6381          | -282.7792      | -300.5774    | -2.5157         | -2.6051       |
| 0.0535        | 1.55  | 1500 | 0.5731          | -2.1627        | -4.7943          | 0.75               | 2.6316          | -277.6110      | -295.4747    | -2.5616         | -2.6529       |
| 0.063         | 1.65  | 1600 | 0.5433          | -1.9823        | -4.5765          | 0.7580             | 2.5942          | -275.4325      | -293.6702    | -2.5038         | -2.5985       |
| 0.0423        | 1.76  | 1700 | 0.5821          | -2.6553        | -5.4183          | 0.7540             | 2.7630          | -283.8502      | -300.3999    | -2.4636         | -2.5654       |
| 0.0559        | 1.86  | 1800 | 0.5657          | -2.5801        | -5.2643          | 0.7520             | 2.6842          | -282.3106      | -299.6483    | -2.4843         | -2.5741       |
| 0.0468        | 1.96  | 1900 | 0.5759          | -2.4597        | -5.2907          | 0.7480             | 2.8309          | -282.5742      | -298.4443    | -2.4491         | -2.5392       |
| 0.0576        | 2.07  | 2000 | 0.5614          | -2.5997        | -5.3232          | 0.7620             | 2.7235          | -282.8997      | -299.8446    | -2.4132         | -2.5016       |
| 0.0135        | 2.17  | 2100 | 0.6182          | -3.1988        | -6.3849          | 0.7640             | 3.1861          | -293.5166      | -305.8354    | -2.4052         | -2.5040       |
| 0.0149        | 2.27  | 2200 | 0.7075          | -4.5960        | -8.1955          | 0.7420             | 3.5995          | -311.6229      | -319.8072    | -2.3535         | -2.4494       |
| 0.0095        | 2.37  | 2300 | 0.7117          | -4.2102        | -7.7788          | 0.7540             | 3.5686          | -307.4559      | -315.9493    | -2.2943         | -2.3972       |
| 0.0104        | 2.48  | 2400 | 0.7131          | -4.3371        | -7.9252          | 0.7540             | 3.5881          | -308.9199      | -317.2180    | -2.3097         | -2.4097       |
| 0.008         | 2.58  | 2500 | 0.7328          | -4.4361        | -8.1696          | 0.7520             | 3.7335          | -311.3636      | -318.2084    | -2.2756         | -2.3764       |
| 0.0051        | 2.68  | 2600 | 0.7193          | -4.2884        | -7.9892          | 0.7600             | 3.7009          | -309.5601      | -316.7311    | -2.3138         | -2.4185       |
| 0.0089        | 2.79  | 2700 | 0.7388          | -4.8991        | -8.6552          | 0.7660             | 3.7561          | -316.2196      | -322.8380    | -2.2942         | -2.3960       |
| 0.0082        | 2.89  | 2800 | 0.7342          | -4.7984        | -8.6596          | 0.7640             | 3.8612          | -316.2638      | -321.8309    | -2.2620         | -2.3649       |
| 0.0094        | 2.99  | 2900 | 0.7374          | -4.7573        | -8.6168          | 0.7580             | 3.8595          | -315.8361      | -321.4205    | -2.2595         | -2.3625       |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.0
- Datasets 2.15.0
- Tokenizers 0.15.0