Edit model card

zephyr-7b-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4873
  • Rewards/chosen: -2.9667
  • Rewards/rejected: -4.1000
  • Rewards/accuracies: 0.7445
  • Rewards/margins: 1.1333
  • Logps/rejected: -654.6072
  • Logps/chosen: -561.3217
  • Logits/rejected: -0.9450
  • Logits/chosen: -1.0724

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.6819 0.03 100 -2.0959 -1.9565 -259.6472 -241.9029 0.6822 0.6545 0.0500 0.0230 0.0271
0.6548 0.05 200 0.6500 -0.1489 -0.2515 0.6780 0.1027 -269.7628 -279.5373 -1.9329 -2.0695
0.6084 0.08 300 0.6213 -0.2956 -0.4998 0.6810 0.2042 -294.5921 -294.2169 -1.8771 -2.0114
0.6237 0.1 400 0.6039 -0.4538 -0.7401 0.6935 0.2863 -318.6170 -310.0349 -1.8367 -1.9656
0.5534 0.13 500 0.5692 -0.9154 -1.3927 0.7050 0.4773 -383.8828 -356.1946 -1.5403 -1.6712
0.5613 0.16 600 0.5659 -0.8123 -1.3218 0.7025 0.5095 -376.7896 -345.8830 -1.3701 -1.5049
0.5139 0.18 700 0.5572 -2.6368 -3.4670 0.7145 0.8302 -591.3087 -528.3278 -0.8924 -1.0174
0.5184 0.21 800 0.5374 -1.4908 -2.1870 0.7160 0.6962 -463.3091 -413.7339 -1.1141 -1.2460
0.5211 0.24 900 0.5332 -2.5430 -3.3947 0.7180 0.8518 -584.0806 -518.9495 -0.8116 -0.9341
0.5553 0.26 1000 0.5178 -2.1745 -3.0424 0.7315 0.8679 -548.8491 -482.0993 -0.8557 -0.9813
0.5994 0.29 1100 0.5207 -2.5002 -3.3276 0.7300 0.8275 -577.3698 -514.6677 -0.7615 -0.8896
0.5976 0.31 1200 0.5098 -2.1833 -2.9905 0.7365 0.8072 -543.6604 -482.9834 -0.8350 -0.9596
0.5237 0.34 1300 0.5166 -3.0973 -4.1628 0.7350 1.0654 -660.8850 -574.3862 -0.7072 -0.8259
0.516 0.37 1400 0.5108 -2.1009 -3.0663 0.7350 0.9654 -551.2367 -474.7425 -0.7865 -0.9128
0.4593 0.39 1500 0.5174 -2.3167 -3.4254 0.7305 1.1088 -587.1506 -496.3185 -0.8903 -1.0211
0.5545 0.42 1600 0.5032 -2.9938 -4.0820 0.7370 1.0882 -652.8123 -564.0355 -0.8801 -1.0082
0.5425 0.44 1700 0.4996 -3.3496 -4.4061 0.7405 1.0565 -685.2187 -599.6096 -0.8382 -0.9686
0.4825 0.47 1800 0.5037 -3.0446 -4.1288 0.7380 1.0842 -657.4884 -569.1091 -0.8738 -1.0006
0.4455 0.5 1900 0.4962 -3.0223 -4.1482 0.7420 1.1259 -659.4305 -566.8840 -0.8910 -1.0214
0.4817 0.52 2000 0.4974 -3.5987 -4.6648 0.7470 1.0660 -711.0853 -624.5250 -0.8139 -0.9428
0.5079 0.55 2100 0.4923 -3.1751 -4.2293 0.7520 1.0542 -667.5426 -582.1657 -0.8739 -1.0031
0.477 0.58 2200 0.4897 -2.6127 -3.5713 0.7410 0.9587 -601.7402 -525.9182 -0.9567 -1.0880
0.4829 0.6 2300 0.4887 -2.9530 -4.0954 0.7485 1.1424 -654.1511 -559.9558 -0.9032 -1.0313
0.4752 0.63 2400 0.4909 -3.1480 -4.2815 0.7445 1.1335 -672.7583 -579.4506 -0.8495 -0.9765
0.5249 0.65 2500 0.4891 -3.0936 -4.2029 0.7445 1.1093 -664.8962 -574.0093 -0.9136 -1.0435
0.4596 0.68 2600 0.4939 -2.9492 -4.0985 0.7400 1.1493 -654.4570 -559.5698 -0.9264 -1.0549
0.5152 0.71 2700 0.4922 -3.0197 -4.1572 0.7440 1.1375 -660.3236 -566.6193 -0.9249 -1.0527
0.4518 0.73 2800 0.4908 -3.0666 -4.2342 0.7415 1.1676 -668.0294 -571.3138 -0.9260 -1.0535
0.5018 0.76 2900 0.4877 -3.0977 -4.2382 0.7465 1.1405 -668.4285 -574.4260 -0.9320 -1.0595
0.4592 0.79 3000 0.4873 -2.9934 -4.1134 0.7460 1.1200 -655.9471 -563.9877 -0.9510 -1.0788
0.4905 0.81 3100 0.4878 -2.9825 -4.1198 0.7430 1.1373 -656.5853 -562.9043 -0.9465 -1.0741
0.485 0.84 3200 0.4874 -2.9459 -4.0754 0.7455 1.1296 -652.1517 -559.2400 -0.9531 -1.0807
0.5157 0.86 3300 0.4874 -2.9550 -4.0838 0.7445 1.1289 -652.9912 -560.1489 -0.9481 -1.0755
0.4474 0.89 3400 0.4871 -2.9699 -4.1019 0.7435 1.1321 -654.8017 -561.6381 -0.9499 -1.0773
0.5379 0.92 3500 0.4874 -2.9663 -4.0989 0.7430 1.1326 -654.5006 -561.2808 -0.9468 -1.0742
0.464 0.94 3600 0.4874 -2.9638 -4.0967 0.7425 1.1329 -654.2791 -561.0286 -0.9475 -1.0748
0.4729 0.97 3700 0.4873 -2.9666 -4.0999 0.7445 1.1333 -654.6014 -561.3129 -0.9495 -1.0770
0.5017 0.99 3800 0.4873 -2.9667 -4.1000 0.7445 1.1333 -654.6072 -561.3217 -0.9450 -1.0724

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.2.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
4
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for geonmin-kim/zephyr-7b-dpo-qlora

Adapter
(1171)
this model