File size: 7,488 Bytes
2c9c1f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: mit
base_model: databricks/dolly-v2-7b
tags:
- generated_from_trainer
model-index:
- name: dolly-v2-7b-dpo-full-1-epoch-hydrox-safe
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# dolly-v2-7b-dpo-full-1-epoch-hydrox-safe

This model is a fine-tuned version of [databricks/dolly-v2-7b](https://huggingface.co/databricks/dolly-v2-7b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0371
- Rewards/chosen: 4.2799
- Rewards/rejected: -3.8888
- Rewards/accuracies: 0.9857
- Rewards/margins: 8.1686
- Logps/rejected: -598.4040
- Logps/chosen: -377.1240
- Logits/rejected: -1.2002
- Logits/chosen: -1.5171

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.618         | 0.03  | 100  | 0.5642          | 0.6988         | -0.1139          | 0.7424             | 0.8127          | -560.6550      | -412.9344    | -1.1894         | -1.4878       |
| 0.3539        | 0.07  | 200  | 0.3197          | 1.9159         | -0.2730          | 0.8847             | 2.1889          | -562.2463      | -400.7641    | -1.1625         | -1.4800       |
| 0.2287        | 0.1   | 300  | 0.2128          | 2.8057         | -0.5539          | 0.9200             | 3.3596          | -565.0551      | -391.8654    | -1.1361         | -1.4649       |
| 0.158         | 0.14  | 400  | 0.1673          | 3.4556         | -1.0339          | 0.9327             | 4.4895          | -569.8558      | -385.3670    | -1.1300         | -1.4622       |
| 0.1599        | 0.17  | 500  | 0.1397          | 3.7485         | -1.3338          | 0.9461             | 5.0823          | -572.8546      | -382.4376    | -1.1275         | -1.4607       |
| 0.1389        | 0.2   | 600  | 0.1273          | 3.9259         | -1.5111          | 0.9529             | 5.4371          | -574.6277      | -380.6633    | -1.1194         | -1.4519       |
| 0.0778        | 0.24  | 700  | 0.1122          | 4.0699         | -1.8498          | 0.9613             | 5.9197          | -578.0140      | -379.2233    | -1.1302         | -1.4542       |
| 0.0993        | 0.27  | 800  | 0.0975          | 4.2423         | -1.9934          | 0.9663             | 6.2357          | -579.4506      | -377.5001    | -1.1424         | -1.4689       |
| 0.111         | 0.31  | 900  | 0.0907          | 4.3218         | -2.2534          | 0.9697             | 6.5752          | -582.0501      | -376.7048    | -1.1542         | -1.4820       |
| 0.0893        | 0.34  | 1000 | 0.0882          | 4.3878         | -2.2588          | 0.9663             | 6.6466          | -582.1047      | -376.0451    | -1.1497         | -1.4694       |
| 0.079         | 0.37  | 1100 | 0.0840          | 4.4706         | -2.3132          | 0.9689             | 6.7838          | -582.6481      | -375.2164    | -1.1532         | -1.4807       |
| 0.0706        | 0.41  | 1200 | 0.0721          | 4.4319         | -2.6505          | 0.9722             | 7.0824          | -586.0217      | -375.6038    | -1.1667         | -1.4885       |
| 0.0705        | 0.44  | 1300 | 0.0725          | 4.3743         | -2.8717          | 0.9739             | 7.2460          | -588.2330      | -376.1799    | -1.1817         | -1.5001       |
| 0.0537        | 0.48  | 1400 | 0.0648          | 4.3847         | -2.9676          | 0.9756             | 7.3523          | -589.1927      | -376.0760    | -1.1789         | -1.5019       |
| 0.0483        | 0.51  | 1500 | 0.0604          | 4.3761         | -3.2295          | 0.9798             | 7.6056          | -591.8114      | -376.1613    | -1.1923         | -1.5114       |
| 0.0572        | 0.54  | 1600 | 0.0581          | 4.3258         | -3.2641          | 0.9773             | 7.5899          | -592.1575      | -376.6645    | -1.1855         | -1.5042       |
| 0.066         | 0.58  | 1700 | 0.0539          | 4.3270         | -3.3813          | 0.9815             | 7.7083          | -593.3289      | -376.6523    | -1.1886         | -1.5110       |
| 0.0561        | 0.61  | 1800 | 0.0501          | 4.3859         | -3.3980          | 0.9798             | 7.7839          | -593.4964      | -376.0636    | -1.1948         | -1.5144       |
| 0.0538        | 0.65  | 1900 | 0.0504          | 4.4209         | -3.4478          | 0.9815             | 7.8687          | -593.9944      | -375.7137    | -1.2036         | -1.5147       |
| 0.0493        | 0.68  | 2000 | 0.0472          | 4.3835         | -3.5804          | 0.9832             | 7.9639          | -595.3203      | -376.0873    | -1.1925         | -1.5071       |
| 0.0374        | 0.71  | 2100 | 0.0449          | 4.2972         | -3.7998          | 0.9840             | 8.0970          | -597.5147      | -376.9510    | -1.2020         | -1.5166       |
| 0.0475        | 0.75  | 2200 | 0.0442          | 4.3073         | -3.6486          | 0.9840             | 7.9559          | -596.0024      | -376.8494    | -1.1992         | -1.5177       |
| 0.0407        | 0.78  | 2300 | 0.0408          | 4.3011         | -3.7981          | 0.9882             | 8.0992          | -597.4978      | -376.9122    | -1.2078         | -1.5242       |
| 0.0386        | 0.82  | 2400 | 0.0397          | 4.3423         | -3.7314          | 0.9882             | 8.0737          | -596.8302      | -376.4996    | -1.2029         | -1.5133       |
| 0.0504        | 0.85  | 2500 | 0.0390          | 4.3732         | -3.7690          | 0.9857             | 8.1422          | -597.2065      | -376.1912    | -1.2024         | -1.5188       |
| 0.0402        | 0.88  | 2600 | 0.0377          | 4.3358         | -3.8299          | 0.9865             | 8.1656          | -597.8150      | -376.5649    | -1.1977         | -1.5158       |
| 0.038         | 0.92  | 2700 | 0.0397          | 4.3284         | -3.8383          | 0.9891             | 8.1667          | -597.8990      | -376.6386    | -1.2033         | -1.5139       |
| 0.0527        | 0.95  | 2800 | 0.0383          | 4.2985         | -3.8490          | 0.9857             | 8.1475          | -598.0059      | -376.9374    | -1.2037         | -1.5196       |
| 0.0365        | 0.99  | 2900 | 0.0379          | 4.3086         | -3.8349          | 0.9874             | 8.1435          | -597.8653      | -376.8369    | -1.1997         | -1.5156       |


### Framework versions

- Transformers 4.35.0
- Pytorch 2.1.1+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1