---
library_name: transformers
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: OpenELM-1_1B-DPO-full-max-10-reward
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# OpenELM-1_1B-DPO-full-max-10-reward

This model was trained from scratch on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.3938
- Rewards/chosen: -10.0625
- Rewards/rejected: -12.0625
- Rewards/accuracies: 0.6152
- Rewards/margins: 2.0156
- Logps/rejected: -1496.0
- Logps/chosen: -1328.0
- Logits/rejected: 1.9375
- Logits/chosen: 0.4395

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.3669        | 0.1047 | 100  | 0.6720          | -1.4219        | -1.75            | 0.5977             | 0.3301          | -464.0         | -460.0       | -13.0           | -13.25        |
| 0.3019        | 0.2094 | 200  | 0.7079          | -2.1875        | -2.5469          | 0.5801             | 0.3477          | -544.0         | -536.0       | -8.25           | -8.875        |
| 0.2872        | 0.3141 | 300  | 0.9193          | -4.5938        | -5.125           | 0.5508             | 0.5195          | -800.0         | -776.0       | -10.125         | -10.8125      |
| 0.2766        | 0.4188 | 400  | 0.7222          | -3.75          | -4.25            | 0.6074             | 0.5156          | -716.0         | -692.0       | -8.0625         | -8.8125       |
| 0.2443        | 0.5236 | 500  | 0.8614          | -5.1875        | -6.0938          | 0.6055             | 0.8906          | -896.0         | -836.0       | -4.9375         | -5.875        |
| 0.2505        | 0.6283 | 600  | 0.8266          | -4.5           | -5.1875          | 0.5957             | 0.6719          | -808.0         | -768.0       | -4.5938         | -5.6562       |
| 0.2305        | 0.7330 | 700  | 0.7984          | -5.375         | -6.25            | 0.6289             | 0.8594          | -912.0         | -856.0       | -3.7031         | -5.0625       |
| 0.2384        | 0.8377 | 800  | 0.9506          | -5.875         | -6.625           | 0.5723             | 0.7578          | -952.0         | -904.0       | -3.8281         | -5.0312       |
| 0.2003        | 0.9424 | 900  | 0.9553          | -6.8438        | -7.8125          | 0.5938             | 0.9883          | -1072.0        | -1000.0      | -2.5            | -3.75         |
| 0.0478        | 1.0471 | 1000 | 1.2033          | -8.1875        | -9.3125          | 0.5996             | 1.1641          | -1224.0        | -1136.0      | -1.9453         | -3.5156       |
| 0.0626        | 1.1518 | 1100 | 1.1790          | -8.1875        | -9.6875          | 0.5918             | 1.5156          | -1256.0        | -1136.0      | -1.5781         | -3.2031       |
| 0.0518        | 1.2565 | 1200 | 1.1558          | -8.3125        | -9.5             | 0.6016             | 1.2031          | -1240.0        | -1144.0      | -0.2715         | -1.8516       |
| 0.0627        | 1.3613 | 1300 | 1.2760          | -8.0625        | -9.4375          | 0.5918             | 1.3672          | -1232.0        | -1120.0      | -0.9414         | -2.4531       |
| 0.067         | 1.4660 | 1400 | 1.1144          | -7.625         | -9.0             | 0.6113             | 1.3516          | -1184.0        | -1080.0      | 1.1875          | -0.4336       |
| 0.057         | 1.5707 | 1500 | 1.2384          | -8.8125        | -10.25           | 0.5781             | 1.4453          | -1312.0        | -1200.0      | 1.4453          | -0.0266       |
| 0.0549        | 1.6754 | 1600 | 1.1039          | -7.875         | -9.1875          | 0.6016             | 1.3047          | -1208.0        | -1104.0      | 1.4922          | -0.0466       |
| 0.065         | 1.7801 | 1700 | 1.2125          | -8.1875        | -9.8125          | 0.6055             | 1.6016          | -1272.0        | -1136.0      | 1.5391          | -0.0018       |
| 0.0477        | 1.8848 | 1800 | 1.2242          | -8.4375        | -10.0            | 0.6035             | 1.5469          | -1288.0        | -1160.0      | 2.0469          | 0.5508        |
| 0.0232        | 1.9895 | 1900 | 1.1594          | -8.125         | -9.6875          | 0.6152             | 1.5938          | -1256.0        | -1128.0      | 1.9297          | 0.4180        |
| 0.0025        | 2.0942 | 2000 | 1.2469          | -9.1875        | -11.0            | 0.6035             | 1.8438          | -1392.0        | -1232.0      | 2.0938          | 0.5664        |
| 0.0064        | 2.1990 | 2100 | 1.3712          | -10.1875       | -12.1875         | 0.6055             | 1.9844          | -1504.0        | -1336.0      | 2.3281          | 0.8320        |
| 0.0068        | 2.3037 | 2200 | 1.2939          | -9.5625        | -11.4375         | 0.6094             | 1.8359          | -1432.0        | -1280.0      | 2.1094          | 0.6328        |
| 0.0106        | 2.4084 | 2300 | 1.3934          | -10.375        | -12.375          | 0.6074             | 1.9766          | -1528.0        | -1360.0      | 2.2344          | 0.7539        |
| 0.0074        | 2.5131 | 2400 | 1.4226          | -10.4375       | -12.4375         | 0.6152             | 2.0312          | -1536.0        | -1360.0      | 2.125           | 0.6367        |
| 0.0055        | 2.6178 | 2500 | 1.4319          | -10.5625       | -12.625          | 0.6152             | 2.0625          | -1552.0        | -1376.0      | 2.1094          | 0.6211        |
| 0.0094        | 2.7225 | 2600 | 1.3983          | -10.125        | -12.125          | 0.6152             | 2.0156          | -1504.0        | -1328.0      | 1.9375          | 0.4336        |
| 0.0045        | 2.8272 | 2700 | 1.3869          | -10.0          | -12.0            | 0.6133             | 2.0156          | -1488.0        | -1320.0      | 1.9297          | 0.4238        |
| 0.0065        | 2.9319 | 2800 | 1.3938          | -10.0625       | -12.0625         | 0.6152             | 2.0156          | -1496.0        | -1328.0      | 1.9375          | 0.4395        |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.3.0
- Datasets 3.0.0
- Tokenizers 0.19.1