File size: 8,573 Bytes
544e269
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
base_model: rasyosef/phi-2-sft-openhermes-128k-v2-merged
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: phi-2-openhermes-128k-v2-dpo-combined
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# phi-2-openhermes-128k-v2-dpo-combined

This model is a fine-tuned version of [rasyosef/phi-2-sft-openhermes-128k-v2-merged](https://huggingface.co/rasyosef/phi-2-sft-openhermes-128k-v2-merged) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5599
- Rewards/chosen: -0.3234
- Rewards/rejected: -0.9542
- Rewards/accuracies: 0.6812
- Rewards/margins: 0.6309
- Logps/rejected: -158.4123
- Logps/chosen: -144.1796
- Logits/rejected: -1.6783
- Logits/chosen: -1.6735

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 250
- num_epochs: 2
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6927        | 0.0583 | 100  | 0.6927          | -0.0007        | -0.0020          | 0.4976             | 0.0012          | -148.8894      | -140.9533    | -1.7645         | -1.7622       |
| 0.6903        | 0.1166 | 200  | 0.6848          | -0.0085        | -0.0260          | 0.5556             | 0.0175          | -149.1299      | -141.0305    | -1.7667         | -1.7644       |
| 0.6757        | 0.1749 | 300  | 0.6530          | -0.0338        | -0.1263          | 0.6618             | 0.0924          | -150.1323      | -141.2841    | -1.7686         | -1.7658       |
| 0.6457        | 0.2332 | 400  | 0.6189          | -0.0854        | -0.2869          | 0.7053             | 0.2015          | -151.7387      | -141.7998    | -1.7678         | -1.7649       |
| 0.6231        | 0.2915 | 500  | 0.5994          | -0.1345        | -0.4309          | 0.6908             | 0.2964          | -153.1783      | -142.2908    | -1.7660         | -1.7625       |
| 0.6001        | 0.3499 | 600  | 0.5882          | -0.1854        | -0.5670          | 0.7041             | 0.3816          | -154.5396      | -142.7997    | -1.7626         | -1.7594       |
| 0.6071        | 0.4082 | 700  | 0.5832          | -0.2023        | -0.6173          | 0.7126             | 0.4149          | -155.0424      | -142.9693    | -1.7564         | -1.7533       |
| 0.6114        | 0.4665 | 800  | 0.5801          | -0.2174        | -0.6640          | 0.7017             | 0.4466          | -155.5101      | -143.1204    | -1.7551         | -1.7514       |
| 0.5963        | 0.5248 | 900  | 0.5749          | -0.2216        | -0.6958          | 0.7198             | 0.4742          | -155.8275      | -143.1621    | -1.7411         | -1.7376       |
| 0.5958        | 0.5831 | 1000 | 0.5739          | -0.2352        | -0.7314          | 0.7077             | 0.4961          | -156.1834      | -143.2981    | -1.7384         | -1.7346       |
| 0.5883        | 0.6414 | 1100 | 0.5719          | -0.2631        | -0.7884          | 0.6920             | 0.5253          | -156.7536      | -143.5765    | -1.7338         | -1.7297       |
| 0.5821        | 0.6997 | 1200 | 0.5712          | -0.2920        | -0.8496          | 0.6993             | 0.5575          | -157.3655      | -143.8663    | -1.7305         | -1.7266       |
| 0.6037        | 0.7580 | 1300 | 0.5691          | -0.2837        | -0.8327          | 0.6993             | 0.5490          | -157.1967      | -143.7830    | -1.7239         | -1.7196       |
| 0.5781        | 0.8163 | 1400 | 0.5680          | -0.3013        | -0.8689          | 0.6920             | 0.5676          | -157.5589      | -143.9591    | -1.7173         | -1.7132       |
| 0.5985        | 0.8746 | 1500 | 0.5685          | -0.2801        | -0.8286          | 0.7005             | 0.5485          | -157.1556      | -143.7466    | -1.7099         | -1.7055       |
| 0.5925        | 0.9329 | 1600 | 0.5677          | -0.2742        | -0.8259          | 0.7005             | 0.5516          | -157.1285      | -143.6882    | -1.7002         | -1.6959       |
| 0.6039        | 0.9913 | 1700 | 0.5658          | -0.2697        | -0.8189          | 0.7005             | 0.5492          | -157.0589      | -143.6426    | -1.6978         | -1.6936       |
| 0.5883        | 1.0496 | 1800 | 0.5648          | -0.2695        | -0.8269          | 0.7029             | 0.5574          | -157.1392      | -143.6413    | -1.6960         | -1.6915       |
| 0.5844        | 1.1079 | 1900 | 0.5644          | -0.2821        | -0.8480          | 0.6920             | 0.5659          | -157.3497      | -143.7664    | -1.6906         | -1.6863       |
| 0.5606        | 1.1662 | 2000 | 0.5646          | -0.3007        | -0.8863          | 0.6993             | 0.5856          | -157.7325      | -143.9527    | -1.6925         | -1.6878       |
| 0.5835        | 1.2245 | 2100 | 0.5631          | -0.3071        | -0.8997          | 0.6957             | 0.5926          | -157.8670      | -144.0166    | -1.6917         | -1.6875       |
| 0.5801        | 1.2828 | 2200 | 0.5622          | -0.3144        | -0.9213          | 0.6884             | 0.6069          | -158.0828      | -144.0901    | -1.6850         | -1.6805       |
| 0.6022        | 1.3411 | 2300 | 0.5637          | -0.3096        | -0.9078          | 0.6993             | 0.5982          | -157.9474      | -144.0419    | -1.6837         | -1.6793       |
| 0.5694        | 1.3994 | 2400 | 0.5618          | -0.3143        | -0.9225          | 0.6884             | 0.6082          | -158.0945      | -144.0888    | -1.6834         | -1.6790       |
| 0.5703        | 1.4577 | 2500 | 0.5612          | -0.3125        | -0.9247          | 0.6957             | 0.6121          | -158.1165      | -144.0712    | -1.6803         | -1.6758       |
| 0.5732        | 1.5160 | 2600 | 0.5590          | -0.3150        | -0.9377          | 0.6957             | 0.6228          | -158.2469      | -144.0954    | -1.6801         | -1.6750       |
| 0.5584        | 1.5743 | 2700 | 0.5603          | -0.3206        | -0.9441          | 0.6848             | 0.6235          | -158.3112      | -144.1520    | -1.6796         | -1.6749       |
| 0.5677        | 1.6327 | 2800 | 0.5605          | -0.3233        | -0.9494          | 0.6884             | 0.6260          | -158.3634      | -144.1790    | -1.6800         | -1.6752       |
| 0.575         | 1.6910 | 2900 | 0.5609          | -0.3235        | -0.9500          | 0.6920             | 0.6265          | -158.3701      | -144.1811    | -1.6788         | -1.6741       |
| 0.5752        | 1.7493 | 3000 | 0.5604          | -0.3242        | -0.9528          | 0.6920             | 0.6286          | -158.3975      | -144.1876    | -1.6782         | -1.6734       |
| 0.57          | 1.8076 | 3100 | 0.5609          | -0.3242        | -0.9536          | 0.6896             | 0.6295          | -158.4062      | -144.1877    | -1.6779         | -1.6727       |
| 0.5759        | 1.8659 | 3200 | 0.5608          | -0.3244        | -0.9537          | 0.6884             | 0.6293          | -158.4068      | -144.1899    | -1.6783         | -1.6734       |
| 0.5789        | 1.9242 | 3300 | 0.5600          | -0.3228        | -0.9558          | 0.6884             | 0.6330          | -158.4273      | -144.1738    | -1.6778         | -1.6727       |
| 0.5622        | 1.9825 | 3400 | 0.5599          | -0.3234        | -0.9542          | 0.6812             | 0.6309          | -158.4123      | -144.1796    | -1.6783         | -1.6735       |


### Framework versions

- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1