Edit model card

zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4877
  • Rewards/chosen: -2.5728
  • Rewards/rejected: -3.6607
  • Rewards/accuracies: 0.7510
  • Rewards/margins: 1.0879
  • Logps/rejected: -614.8131
  • Logps/chosen: -522.7775
  • Logits/rejected: -1.0677
  • Logits/chosen: -1.1961

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6864 0.0262 100 0.6863 0.0245 0.0109 0.6560 0.0137 -247.6524 -263.0416 -2.2125 -2.3103
0.6536 0.0523 200 0.6562 -0.0059 -0.0864 0.6680 0.0805 -257.3772 -266.0850 -2.1777 -2.2752
0.6047 0.0785 300 0.6286 -0.1438 -0.3204 0.6660 0.1767 -280.7805 -279.8720 -2.1601 -2.2542
0.6299 0.1047 400 0.6084 -0.3193 -0.5734 0.6850 0.2541 -306.0758 -297.4266 -2.0907 -2.1881
0.5709 0.1309 500 0.5789 -0.7471 -1.1867 0.7000 0.4396 -367.4122 -340.2105 -2.0692 -2.1605
0.5488 0.1570 600 0.5658 -0.7001 -1.1923 0.7100 0.4921 -367.9675 -335.5099 -2.0030 -2.0983
0.5568 0.1832 700 0.5678 -1.3595 -2.0541 0.7080 0.6947 -454.1522 -401.4426 -1.8573 -1.9509
0.5047 0.2094 800 0.5371 -1.2892 -1.9528 0.7240 0.6636 -444.0185 -394.4196 -1.9046 -1.9916
0.5053 0.2355 900 0.5388 -1.5032 -2.2420 0.7260 0.7388 -472.9430 -415.8180 -1.8678 -1.9410
0.5438 0.2617 1000 0.5343 -1.5270 -2.2670 0.7400 0.7400 -475.4426 -418.1995 -1.8710 -1.9472
0.595 0.2879 1100 0.5290 -1.4070 -2.1205 0.7370 0.7135 -460.7867 -406.1953 -1.6012 -1.6936
0.5628 0.3141 1200 0.5159 -1.2461 -1.9645 0.7430 0.7183 -445.1867 -390.1104 -1.4961 -1.5992
0.5334 0.3402 1300 0.5106 -1.5548 -2.3857 0.7410 0.8309 -487.3135 -420.9798 -1.4528 -1.5555
0.5324 0.3664 1400 0.5133 -1.4606 -2.3185 0.7300 0.8579 -480.5880 -411.5592 -1.6116 -1.6971
0.4708 0.3926 1500 0.5117 -1.5267 -2.4780 0.7460 0.9513 -496.5367 -418.1663 -1.6359 -1.7246
0.567 0.4188 1600 0.5051 -1.5586 -2.4438 0.7360 0.8851 -493.1144 -421.3598 -1.5723 -1.6655
0.5167 0.4449 1700 0.5078 -1.8167 -2.7043 0.7350 0.8876 -519.1691 -447.1625 -1.5701 -1.6681
0.4877 0.4711 1800 0.5059 -1.6146 -2.5493 0.7450 0.9347 -503.6712 -426.9594 -1.5519 -1.6424
0.4667 0.4973 1900 0.5021 -1.8349 -2.8150 0.7400 0.9801 -530.2404 -448.9849 -1.3739 -1.4795
0.4689 0.5234 2000 0.4990 -2.4178 -3.3735 0.7420 0.9557 -586.0923 -507.2770 -1.1223 -1.2484
0.5027 0.5496 2100 0.4956 -2.3322 -3.3229 0.7400 0.9908 -581.0334 -498.7141 -1.1468 -1.2691
0.4786 0.5758 2200 0.4934 -2.2149 -3.1817 0.7520 0.9668 -566.9105 -486.9841 -1.1241 -1.2533
0.4833 0.6020 2300 0.4928 -2.4249 -3.4764 0.7520 1.0515 -596.3792 -507.9904 -1.0953 -1.2229
0.4706 0.6281 2400 0.4934 -2.3828 -3.4151 0.7450 1.0323 -590.2535 -503.7771 -1.0842 -1.2077
0.5112 0.6543 2500 0.4928 -2.3750 -3.4387 0.7440 1.0637 -592.6089 -502.9985 -1.1090 -1.2373
0.4721 0.6805 2600 0.4987 -2.3590 -3.4594 0.7520 1.1004 -594.6805 -501.3951 -1.1359 -1.2595
0.4788 0.7066 2700 0.4924 -2.6480 -3.7521 0.7480 1.1041 -623.9493 -530.2946 -1.0600 -1.1861
0.4664 0.7328 2800 0.4912 -2.7089 -3.8484 0.7460 1.1395 -633.5744 -536.3848 -1.0451 -1.1713
0.499 0.7590 2900 0.4879 -2.5879 -3.6683 0.75 1.0804 -615.5711 -524.2902 -1.0599 -1.1874
0.4689 0.7852 3000 0.4874 -2.5919 -3.6653 0.7490 1.0734 -615.2720 -524.6861 -1.0534 -1.1823
0.498 0.8113 3100 0.4879 -2.6250 -3.7096 0.7510 1.0846 -619.6946 -527.9911 -1.0592 -1.1882
0.502 0.8375 3200 0.4876 -2.5741 -3.6583 0.7520 1.0842 -614.5652 -522.9026 -1.0700 -1.1979
0.5091 0.8637 3300 0.4877 -2.5605 -3.6430 0.75 1.0825 -613.0379 -521.5475 -1.0677 -1.1962
0.4601 0.8898 3400 0.4878 -2.5736 -3.6608 0.7490 1.0871 -614.8157 -522.8585 -1.0632 -1.1921
0.5339 0.9160 3500 0.4877 -2.5733 -3.6612 0.7520 1.0880 -614.8598 -522.8210 -1.0661 -1.1946
0.4651 0.9422 3600 0.4877 -2.5730 -3.6606 0.7510 1.0876 -614.7937 -522.7916 -1.0655 -1.1942
0.4743 0.9684 3700 0.4877 -2.5733 -3.6613 0.7510 1.0881 -614.8724 -522.8242 -1.0678 -1.1962
0.5193 0.9945 3800 0.4875 -2.5729 -3.6609 0.75 1.0880 -614.8296 -522.7888 -1.0677 -1.1961

Framework versions

  • PEFT 0.7.1
  • Transformers 4.40.1
  • Pytorch 2.1.2
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for junweiliao/zephyr-7b-dpo-qlora

Adapter
(1170)
this model

Dataset used to train junweiliao/zephyr-7b-dpo-qlora