File size: 7,275 Bytes
0efffe3
 
 
 
 
30bb95c
0efffe3
 
 
 
 
 
 
 
 
 
 
 
 
30bb95c
 
 
 
 
 
 
 
 
0efffe3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30bb95c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0efffe3
 
 
 
9ad5984
0efffe3
9ad5984
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
library_name: transformers
tags:
- trl
- dpo
- alignment-handbook
- generated_from_trainer
model-index:
- name: OpenELM-1_1B-DPO-full-max-6-reward
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# OpenELM-1_1B-DPO-full-max-6-reward

This model was trained from scratch on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.8412
- Rewards/chosen: -15.5625
- Rewards/rejected: -17.375
- Rewards/accuracies: 0.6035
- Rewards/margins: 1.7891
- Logps/rejected: -2024.0
- Logps/chosen: -1872.0
- Logits/rejected: 1.625
- Logits/chosen: -0.2451

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.5954        | 0.1047 | 100  | 0.6902          | -1.1172        | -1.2578          | 0.5742             | 0.1416          | -414.0         | -430.0       | -10.75          | -11.0625      |
| 0.5396        | 0.2094 | 200  | 0.6928          | -2.5156        | -2.8594          | 0.6387             | 0.3438          | -576.0         | -572.0       | -9.6875         | -10.25        |
| 0.5587        | 0.3141 | 300  | 0.7014          | -3.7188        | -4.1562          | 0.6094             | 0.4355          | -704.0         | -692.0       | -8.5625         | -9.4375       |
| 0.5067        | 0.4188 | 400  | 0.7869          | -3.875         | -4.375           | 0.5996             | 0.4961          | -724.0         | -708.0       | -14.0           | -14.9375      |
| 0.525         | 0.5236 | 500  | 0.8500          | -4.625         | -5.3438          | 0.6094             | 0.7109          | -824.0         | -780.0       | -6.8438         | -8.25         |
| 0.5173        | 0.6283 | 600  | 0.7292          | -6.25          | -6.7188          | 0.5723             | 0.4688          | -960.0         | -944.0       | -8.875          | -10.0         |
| 0.4944        | 0.7330 | 700  | 0.7881          | -5.1562        | -5.7812          | 0.6035             | 0.6445          | -868.0         | -832.0       | -6.4062         | -8.0          |
| 0.5113        | 0.8377 | 800  | 0.7106          | -4.7812        | -5.3438          | 0.6113             | 0.5586          | -824.0         | -796.0       | -9.4375         | -10.8125      |
| 0.4589        | 0.9424 | 900  | 0.8807          | -7.4375        | -8.1875          | 0.6094             | 0.7656          | -1112.0        | -1064.0      | -6.1875         | -8.125        |
| 0.1368        | 1.0471 | 1000 | 1.1006          | -8.375         | -9.4375          | 0.5879             | 1.0547          | -1232.0        | -1160.0      | -3.4531         | -5.375        |
| 0.138         | 1.1518 | 1100 | 1.0286          | -8.375         | -9.375           | 0.5977             | 0.9531          | -1224.0        | -1160.0      | -4.0938         | -5.9688       |
| 0.1376        | 1.2565 | 1200 | 1.0962          | -8.6875        | -9.75            | 0.6035             | 1.0312          | -1264.0        | -1192.0      | -1.2266         | -3.0781       |
| 0.1434        | 1.3613 | 1300 | 1.1220          | -9.375         | -10.5            | 0.5801             | 1.1172          | -1336.0        | -1256.0      | -3.7031         | -5.6875       |
| 0.1386        | 1.4660 | 1400 | 1.0638          | -9.4375        | -10.375          | 0.6230             | 0.9570          | -1328.0        | -1256.0      | -3.5            | -5.4688       |
| 0.1258        | 1.5707 | 1500 | 1.1923          | -10.5          | -11.75           | 0.6016             | 1.1953          | -1464.0        | -1368.0      | -2.4062         | -4.5625       |
| 0.1269        | 1.6754 | 1600 | 1.2009          | -9.4375        | -10.625          | 0.6074             | 1.1562          | -1352.0        | -1264.0      | -2.8438         | -5.2188       |
| 0.0967        | 1.7801 | 1700 | 1.1723          | -10.0          | -11.125          | 0.5996             | 1.0859          | -1400.0        | -1320.0      | -1.6328         | -3.6406       |
| 0.112         | 1.8848 | 1800 | 1.0807          | -9.75          | -10.75           | 0.5898             | 0.9805          | -1360.0        | -1296.0      | -2.5            | -4.5          |
| 0.1158        | 1.9895 | 1900 | 1.1470          | -10.875        | -12.0625         | 0.5938             | 1.2109          | -1496.0        | -1400.0      | -1.5391         | -3.5625       |
| 0.0172        | 2.0942 | 2000 | 1.6192          | -14.1875       | -15.6875         | 0.6055             | 1.5078          | -1864.0        | -1736.0      | 0.8438          | -1.1172       |
| 0.012         | 2.1990 | 2100 | 1.7070          | -14.6875       | -16.375          | 0.6016             | 1.6953          | -1928.0        | -1792.0      | 0.5117          | -1.4688       |
| 0.0145        | 2.3037 | 2200 | 1.6657          | -14.0625       | -15.625          | 0.5957             | 1.5547          | -1856.0        | -1728.0      | 0.6875          | -1.2891       |
| 0.0161        | 2.4084 | 2300 | 1.8217          | -15.5625       | -17.25           | 0.6035             | 1.7344          | -2016.0        | -1872.0      | 1.0             | -0.9141       |
| 0.0161        | 2.5131 | 2400 | 1.7852          | -15.0          | -16.625          | 0.6055             | 1.6641          | -1960.0        | -1824.0      | 1.6328          | -0.2471       |
| 0.0182        | 2.6178 | 2500 | 1.9600          | -16.25         | -18.125          | 0.5957             | 1.8125          | -2096.0        | -1952.0      | 1.7578          | -0.1089       |
| 0.0121        | 2.7225 | 2600 | 1.8076          | -15.125        | -16.875          | 0.6113             | 1.7656          | -1976.0        | -1832.0      | 1.4922          | -0.4238       |
| 0.016         | 2.8272 | 2700 | 1.8344          | -15.5          | -17.25           | 0.6055             | 1.7891          | -2016.0        | -1872.0      | 1.6016          | -0.2773       |
| 0.0144        | 2.9319 | 2800 | 1.8412          | -15.5625       | -17.375          | 0.6035             | 1.7891          | -2024.0        | -1872.0      | 1.625           | -0.2451       |


### Framework versions

- Transformers 4.45.1
- Pytorch 2.3.0
- Datasets 3.0.1
- Tokenizers 0.20.0