CharlesLi's picture
Model save
30bb95c verified
|
raw
history blame
7.28 kB
metadata
library_name: transformers
tags:
  - trl
  - dpo
  - alignment-handbook
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-max-6-reward
    results: []

OpenELM-1_1B-DPO-full-max-6-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8412
  • Rewards/chosen: -15.5625
  • Rewards/rejected: -17.375
  • Rewards/accuracies: 0.6035
  • Rewards/margins: 1.7891
  • Logps/rejected: -2024.0
  • Logps/chosen: -1872.0
  • Logits/rejected: 1.625
  • Logits/chosen: -0.2451

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5954 0.1047 100 0.6902 -1.1172 -1.2578 0.5742 0.1416 -414.0 -430.0 -10.75 -11.0625
0.5396 0.2094 200 0.6928 -2.5156 -2.8594 0.6387 0.3438 -576.0 -572.0 -9.6875 -10.25
0.5587 0.3141 300 0.7014 -3.7188 -4.1562 0.6094 0.4355 -704.0 -692.0 -8.5625 -9.4375
0.5067 0.4188 400 0.7869 -3.875 -4.375 0.5996 0.4961 -724.0 -708.0 -14.0 -14.9375
0.525 0.5236 500 0.8500 -4.625 -5.3438 0.6094 0.7109 -824.0 -780.0 -6.8438 -8.25
0.5173 0.6283 600 0.7292 -6.25 -6.7188 0.5723 0.4688 -960.0 -944.0 -8.875 -10.0
0.4944 0.7330 700 0.7881 -5.1562 -5.7812 0.6035 0.6445 -868.0 -832.0 -6.4062 -8.0
0.5113 0.8377 800 0.7106 -4.7812 -5.3438 0.6113 0.5586 -824.0 -796.0 -9.4375 -10.8125
0.4589 0.9424 900 0.8807 -7.4375 -8.1875 0.6094 0.7656 -1112.0 -1064.0 -6.1875 -8.125
0.1368 1.0471 1000 1.1006 -8.375 -9.4375 0.5879 1.0547 -1232.0 -1160.0 -3.4531 -5.375
0.138 1.1518 1100 1.0286 -8.375 -9.375 0.5977 0.9531 -1224.0 -1160.0 -4.0938 -5.9688
0.1376 1.2565 1200 1.0962 -8.6875 -9.75 0.6035 1.0312 -1264.0 -1192.0 -1.2266 -3.0781
0.1434 1.3613 1300 1.1220 -9.375 -10.5 0.5801 1.1172 -1336.0 -1256.0 -3.7031 -5.6875
0.1386 1.4660 1400 1.0638 -9.4375 -10.375 0.6230 0.9570 -1328.0 -1256.0 -3.5 -5.4688
0.1258 1.5707 1500 1.1923 -10.5 -11.75 0.6016 1.1953 -1464.0 -1368.0 -2.4062 -4.5625
0.1269 1.6754 1600 1.2009 -9.4375 -10.625 0.6074 1.1562 -1352.0 -1264.0 -2.8438 -5.2188
0.0967 1.7801 1700 1.1723 -10.0 -11.125 0.5996 1.0859 -1400.0 -1320.0 -1.6328 -3.6406
0.112 1.8848 1800 1.0807 -9.75 -10.75 0.5898 0.9805 -1360.0 -1296.0 -2.5 -4.5
0.1158 1.9895 1900 1.1470 -10.875 -12.0625 0.5938 1.2109 -1496.0 -1400.0 -1.5391 -3.5625
0.0172 2.0942 2000 1.6192 -14.1875 -15.6875 0.6055 1.5078 -1864.0 -1736.0 0.8438 -1.1172
0.012 2.1990 2100 1.7070 -14.6875 -16.375 0.6016 1.6953 -1928.0 -1792.0 0.5117 -1.4688
0.0145 2.3037 2200 1.6657 -14.0625 -15.625 0.5957 1.5547 -1856.0 -1728.0 0.6875 -1.2891
0.0161 2.4084 2300 1.8217 -15.5625 -17.25 0.6035 1.7344 -2016.0 -1872.0 1.0 -0.9141
0.0161 2.5131 2400 1.7852 -15.0 -16.625 0.6055 1.6641 -1960.0 -1824.0 1.6328 -0.2471
0.0182 2.6178 2500 1.9600 -16.25 -18.125 0.5957 1.8125 -2096.0 -1952.0 1.7578 -0.1089
0.0121 2.7225 2600 1.8076 -15.125 -16.875 0.6113 1.7656 -1976.0 -1832.0 1.4922 -0.4238
0.016 2.8272 2700 1.8344 -15.5 -17.25 0.6055 1.7891 -2016.0 -1872.0 1.6016 -0.2773
0.0144 2.9319 2800 1.8412 -15.5625 -17.375 0.6035 1.7891 -2024.0 -1872.0 1.625 -0.2451

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.3.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0