File size: 7,274 Bytes
0efffe3
 
 
 
 
30bb95c
0efffe3
 
 
 
 
 
 
 
 
 
 
 
 
79746cb
 
 
 
 
 
 
 
 
0efffe3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79746cb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0efffe3
 
 
 
9ad5984
0efffe3
9ad5984
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
library_name: transformers
tags:
- trl
- dpo
- alignment-handbook
- generated_from_trainer
model-index:
- name: OpenELM-1_1B-DPO-full-max-6-reward
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# OpenELM-1_1B-DPO-full-max-6-reward

This model was trained from scratch on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.8090
- Rewards/chosen: -15.25
- Rewards/rejected: -16.875
- Rewards/accuracies: 0.5859
- Rewards/margins: 1.6719
- Logps/rejected: -1984.0
- Logps/chosen: -1840.0
- Logits/rejected: 0.0815
- Logits/chosen: -1.9688

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.5933        | 0.1047 | 100  | 0.6757          | -1.1562        | -1.3125          | 0.5801             | 0.1611          | -420.0         | -434.0       | -11.5625        | -11.875       |
| 0.5649        | 0.2094 | 200  | 0.6957          | -2.125         | -2.375           | 0.6074             | 0.2451          | -524.0         | -532.0       | -10.125         | -10.5625      |
| 0.5247        | 0.3141 | 300  | 0.7605          | -4.4688        | -4.9688          | 0.6094             | 0.4980          | -784.0         | -764.0       | -7.6875         | -8.4375       |
| 0.5128        | 0.4188 | 400  | 0.7887          | -3.7188        | -4.25            | 0.5977             | 0.5195          | -712.0         | -688.0       | -13.25          | -13.875       |
| 0.5048        | 0.5236 | 500  | 0.7560          | -4.25          | -4.8438          | 0.6309             | 0.6055          | -772.0         | -744.0       | -10.6875        | -11.8125      |
| 0.4935        | 0.6283 | 600  | 0.7500          | -4.6562        | -5.0938          | 0.5781             | 0.4473          | -800.0         | -784.0       | -14.1875        | -14.875       |
| 0.4879        | 0.7330 | 700  | 0.7732          | -5.0938        | -5.7812          | 0.6230             | 0.6797          | -868.0         | -828.0       | -12.5           | -13.8125      |
| 0.4911        | 0.8377 | 800  | 0.7706          | -5.0           | -5.625           | 0.625              | 0.6406          | -852.0         | -816.0       | -13.375         | -14.25        |
| 0.4586        | 0.9424 | 900  | 0.9273          | -7.5312        | -8.3125          | 0.6113             | 0.7773          | -1120.0        | -1072.0      | -9.0            | -10.6875      |
| 0.1423        | 1.0471 | 1000 | 1.1068          | -8.75          | -9.6875          | 0.5879             | 0.9609          | -1256.0        | -1192.0      | -7.0625         | -9.125        |
| 0.1457        | 1.1518 | 1100 | 1.1011          | -8.125         | -9.0625          | 0.5801             | 0.9141          | -1192.0        | -1128.0      | -10.75          | -12.375       |
| 0.1344        | 1.2565 | 1200 | 1.0089          | -8.375         | -9.375           | 0.5996             | 0.9883          | -1224.0        | -1152.0      | -5.3438         | -7.4062       |
| 0.1369        | 1.3613 | 1300 | 1.0540          | -9.4375        | -10.625          | 0.6016             | 1.1797          | -1352.0        | -1264.0      | -5.5312         | -7.5938       |
| 0.1225        | 1.4660 | 1400 | 1.1049          | -9.5625        | -10.625          | 0.6035             | 1.0859          | -1352.0        | -1272.0      | -5.5938         | -7.375        |
| 0.1276        | 1.5707 | 1500 | 1.1785          | -11.0          | -12.25           | 0.6074             | 1.2344          | -1512.0        | -1416.0      | -1.0625         | -3.0625       |
| 0.1177        | 1.6754 | 1600 | 1.1486          | -9.5           | -10.75           | 0.6094             | 1.25            | -1368.0        | -1272.0      | -3.8594         | -5.9062       |
| 0.1007        | 1.7801 | 1700 | 1.1275          | -9.5           | -10.5625         | 0.5840             | 1.0625          | -1344.0        | -1272.0      | -7.75           | -9.3125       |
| 0.1186        | 1.8848 | 1800 | 1.1385          | -9.9375        | -11.0            | 0.5703             | 1.0547          | -1392.0        | -1312.0      | -5.7188         | -7.4375       |
| 0.1098        | 1.9895 | 1900 | 1.2803          | -11.9375       | -13.25           | 0.5879             | 1.3359          | -1616.0        | -1512.0      | -2.7031         | -4.6875       |
| 0.0179        | 2.0942 | 2000 | 1.7014          | -14.5          | -16.0            | 0.5820             | 1.5938          | -1896.0        | -1768.0      | -1.5078         | -3.6406       |
| 0.0165        | 2.1990 | 2100 | 1.7262          | -14.4375       | -16.125          | 0.5801             | 1.6797          | -1904.0        | -1760.0      | -1.9531         | -4.0625       |
| 0.0158        | 2.3037 | 2200 | 1.7524          | -14.25         | -15.8125         | 0.5762             | 1.5703          | -1872.0        | -1744.0      | -1.2344         | -3.3594       |
| 0.0199        | 2.4084 | 2300 | 1.7305          | -14.4375       | -15.9375         | 0.5840             | 1.5391          | -1888.0        | -1760.0      | -0.6211         | -2.6875       |
| 0.0172        | 2.5131 | 2400 | 1.7391          | -14.5625       | -16.125          | 0.5820             | 1.6016          | -1904.0        | -1776.0      | -0.3164         | -2.3906       |
| 0.0162        | 2.6178 | 2500 | 1.8456          | -15.5          | -17.25           | 0.5898             | 1.7031          | -2008.0        | -1872.0      | 0.1270          | -1.9219       |
| 0.0128        | 2.7225 | 2600 | 1.7974          | -15.0625       | -16.75           | 0.5879             | 1.6797          | -1960.0        | -1824.0      | -0.1289         | -2.2031       |
| 0.0168        | 2.8272 | 2700 | 1.8012          | -15.1875       | -16.875          | 0.5879             | 1.6719          | -1976.0        | -1840.0      | 0.0459          | -2.0156       |
| 0.0171        | 2.9319 | 2800 | 1.8090          | -15.25         | -16.875          | 0.5859             | 1.6719          | -1984.0        | -1840.0      | 0.0815          | -1.9688       |


### Framework versions

- Transformers 4.45.1
- Pytorch 2.3.0
- Datasets 3.0.1
- Tokenizers 0.20.0