Edit model card

OpenELM-1_1B-DPO-full-max-reward-least-similar

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2775
  • Rewards/chosen: -5.2812
  • Rewards/rejected: -5.5938
  • Rewards/accuracies: 0.5039
  • Rewards/margins: 0.3301
  • Logps/rejected: -848.0
  • Logps/chosen: -844.0
  • Logits/rejected: -13.25
  • Logits/chosen: -14.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0747 0.1047 100 0.7319 -1.2344 -1.3906 0.5195 0.1494 -428.0 -442.0 -9.875 -10.1875
0.0623 0.2094 200 0.7326 -1.1641 -1.3125 0.5137 0.1494 -420.0 -434.0 -13.3125 -13.625
0.0804 0.3141 300 1.1385 -4.375 -4.5938 0.4863 0.2275 -748.0 -756.0 -10.9375 -11.3125
0.1502 0.4188 400 0.9801 -3.2031 -3.3125 0.4844 0.0991 -620.0 -640.0 -11.375 -11.9375
0.0464 0.5236 500 0.9622 -2.6875 -2.7969 0.4805 0.1074 -568.0 -588.0 -13.125 -13.5625
0.0636 0.6283 600 1.0378 -2.4062 -2.4375 0.4727 0.0264 -532.0 -560.0 -13.6875 -14.0
0.0638 0.7330 700 0.8978 -2.1562 -2.1562 0.5039 0.0037 -504.0 -532.0 -13.6875 -13.875
0.0552 0.8377 800 0.9712 -3.4375 -3.4688 0.4980 0.0332 -636.0 -664.0 -13.0625 -13.5625
0.0459 0.9424 900 1.0447 -4.7188 -4.9688 0.5117 0.2490 -784.0 -788.0 -12.625 -13.0
0.0041 1.0471 1000 1.3027 -4.9688 -5.1875 0.4785 0.2383 -808.0 -816.0 -14.8125 -14.9375
0.0032 1.1518 1100 1.1521 -4.5 -4.625 0.5098 0.1455 -752.0 -768.0 -15.0 -15.3125
0.0068 1.2565 1200 0.9612 -4.5 -4.75 0.5312 0.2617 -764.0 -768.0 -8.5 -9.625
0.0038 1.3613 1300 1.0891 -3.6094 -3.8438 0.5332 0.2471 -672.0 -680.0 -16.75 -16.75
0.0036 1.4660 1400 1.0725 -3.6875 -3.875 0.5254 0.1885 -676.0 -688.0 -15.6875 -15.75
0.0067 1.5707 1500 1.0607 -3.9531 -4.1562 0.5117 0.2158 -704.0 -712.0 -14.625 -14.875
0.0068 1.6754 1600 1.1896 -4.5938 -4.9062 0.5137 0.3164 -780.0 -776.0 -15.1875 -15.5
0.0042 1.7801 1700 1.1288 -4.4062 -4.6562 0.5273 0.2676 -756.0 -760.0 -15.375 -15.875
0.0003 1.8848 1800 1.3009 -5.3125 -5.625 0.5059 0.3203 -852.0 -848.0 -14.6875 -15.1875
0.002 1.9895 1900 1.2142 -4.8438 -5.125 0.5156 0.2871 -800.0 -804.0 -13.625 -14.3125
0.0004 2.0942 2000 1.2300 -4.8438 -5.1562 0.5137 0.2969 -804.0 -804.0 -13.4375 -14.125
0.0148 2.1990 2100 1.2569 -5.0625 -5.375 0.5137 0.3223 -828.0 -824.0 -13.25 -13.9375
0.0009 2.3037 2200 1.2545 -5.3125 -5.625 0.5059 0.3184 -852.0 -848.0 -12.8125 -13.5625
0.0006 2.4084 2300 1.2550 -5.25 -5.5312 0.5098 0.3008 -840.0 -840.0 -12.9375 -13.6875
0.0002 2.5131 2400 1.2758 -5.2812 -5.625 0.5098 0.3223 -848.0 -848.0 -13.1875 -13.875
0.0004 2.6178 2500 1.2774 -5.2812 -5.5938 0.5039 0.3242 -848.0 -844.0 -13.1875 -13.875
0.0002 2.7225 2600 1.2790 -5.2812 -5.5938 0.5039 0.3281 -848.0 -844.0 -13.25 -13.9375
0.0003 2.8272 2700 1.2763 -5.2812 -5.5938 0.5020 0.3320 -848.0 -844.0 -13.25 -13.9375
0.0002 2.9319 2800 1.2775 -5.2812 -5.5938 0.5039 0.3301 -848.0 -844.0 -13.25 -14.0

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.3.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0
Downloads last month
3
Safetensors
Model size
1.08B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.