CharlesLi's picture
Model save
79746cb verified
metadata
library_name: transformers
tags:
  - trl
  - dpo
  - alignment-handbook
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-max-6-reward
    results: []

OpenELM-1_1B-DPO-full-max-6-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8090
  • Rewards/chosen: -15.25
  • Rewards/rejected: -16.875
  • Rewards/accuracies: 0.5859
  • Rewards/margins: 1.6719
  • Logps/rejected: -1984.0
  • Logps/chosen: -1840.0
  • Logits/rejected: 0.0815
  • Logits/chosen: -1.9688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5933 0.1047 100 0.6757 -1.1562 -1.3125 0.5801 0.1611 -420.0 -434.0 -11.5625 -11.875
0.5649 0.2094 200 0.6957 -2.125 -2.375 0.6074 0.2451 -524.0 -532.0 -10.125 -10.5625
0.5247 0.3141 300 0.7605 -4.4688 -4.9688 0.6094 0.4980 -784.0 -764.0 -7.6875 -8.4375
0.5128 0.4188 400 0.7887 -3.7188 -4.25 0.5977 0.5195 -712.0 -688.0 -13.25 -13.875
0.5048 0.5236 500 0.7560 -4.25 -4.8438 0.6309 0.6055 -772.0 -744.0 -10.6875 -11.8125
0.4935 0.6283 600 0.7500 -4.6562 -5.0938 0.5781 0.4473 -800.0 -784.0 -14.1875 -14.875
0.4879 0.7330 700 0.7732 -5.0938 -5.7812 0.6230 0.6797 -868.0 -828.0 -12.5 -13.8125
0.4911 0.8377 800 0.7706 -5.0 -5.625 0.625 0.6406 -852.0 -816.0 -13.375 -14.25
0.4586 0.9424 900 0.9273 -7.5312 -8.3125 0.6113 0.7773 -1120.0 -1072.0 -9.0 -10.6875
0.1423 1.0471 1000 1.1068 -8.75 -9.6875 0.5879 0.9609 -1256.0 -1192.0 -7.0625 -9.125
0.1457 1.1518 1100 1.1011 -8.125 -9.0625 0.5801 0.9141 -1192.0 -1128.0 -10.75 -12.375
0.1344 1.2565 1200 1.0089 -8.375 -9.375 0.5996 0.9883 -1224.0 -1152.0 -5.3438 -7.4062
0.1369 1.3613 1300 1.0540 -9.4375 -10.625 0.6016 1.1797 -1352.0 -1264.0 -5.5312 -7.5938
0.1225 1.4660 1400 1.1049 -9.5625 -10.625 0.6035 1.0859 -1352.0 -1272.0 -5.5938 -7.375
0.1276 1.5707 1500 1.1785 -11.0 -12.25 0.6074 1.2344 -1512.0 -1416.0 -1.0625 -3.0625
0.1177 1.6754 1600 1.1486 -9.5 -10.75 0.6094 1.25 -1368.0 -1272.0 -3.8594 -5.9062
0.1007 1.7801 1700 1.1275 -9.5 -10.5625 0.5840 1.0625 -1344.0 -1272.0 -7.75 -9.3125
0.1186 1.8848 1800 1.1385 -9.9375 -11.0 0.5703 1.0547 -1392.0 -1312.0 -5.7188 -7.4375
0.1098 1.9895 1900 1.2803 -11.9375 -13.25 0.5879 1.3359 -1616.0 -1512.0 -2.7031 -4.6875
0.0179 2.0942 2000 1.7014 -14.5 -16.0 0.5820 1.5938 -1896.0 -1768.0 -1.5078 -3.6406
0.0165 2.1990 2100 1.7262 -14.4375 -16.125 0.5801 1.6797 -1904.0 -1760.0 -1.9531 -4.0625
0.0158 2.3037 2200 1.7524 -14.25 -15.8125 0.5762 1.5703 -1872.0 -1744.0 -1.2344 -3.3594
0.0199 2.4084 2300 1.7305 -14.4375 -15.9375 0.5840 1.5391 -1888.0 -1760.0 -0.6211 -2.6875
0.0172 2.5131 2400 1.7391 -14.5625 -16.125 0.5820 1.6016 -1904.0 -1776.0 -0.3164 -2.3906
0.0162 2.6178 2500 1.8456 -15.5 -17.25 0.5898 1.7031 -2008.0 -1872.0 0.1270 -1.9219
0.0128 2.7225 2600 1.7974 -15.0625 -16.75 0.5879 1.6797 -1960.0 -1824.0 -0.1289 -2.2031
0.0168 2.8272 2700 1.8012 -15.1875 -16.875 0.5879 1.6719 -1976.0 -1840.0 0.0459 -2.0156
0.0171 2.9319 2800 1.8090 -15.25 -16.875 0.5859 1.6719 -1984.0 -1840.0 0.0815 -1.9688

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.3.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0