zephyr-7b-dpo-qlora
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.4877
- Rewards/chosen: -2.5728
- Rewards/rejected: -3.6607
- Rewards/accuracies: 0.7510
- Rewards/margins: 1.0879
- Logps/rejected: -614.8131
- Logps/chosen: -522.7775
- Logits/rejected: -1.0677
- Logits/chosen: -1.1961
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6864 | 0.0262 | 100 | 0.6863 | 0.0245 | 0.0109 | 0.6560 | 0.0137 | -247.6524 | -263.0416 | -2.2125 | -2.3103 |
0.6536 | 0.0523 | 200 | 0.6562 | -0.0059 | -0.0864 | 0.6680 | 0.0805 | -257.3772 | -266.0850 | -2.1777 | -2.2752 |
0.6047 | 0.0785 | 300 | 0.6286 | -0.1438 | -0.3204 | 0.6660 | 0.1767 | -280.7805 | -279.8720 | -2.1601 | -2.2542 |
0.6299 | 0.1047 | 400 | 0.6084 | -0.3193 | -0.5734 | 0.6850 | 0.2541 | -306.0758 | -297.4266 | -2.0907 | -2.1881 |
0.5709 | 0.1309 | 500 | 0.5789 | -0.7471 | -1.1867 | 0.7000 | 0.4396 | -367.4122 | -340.2105 | -2.0692 | -2.1605 |
0.5488 | 0.1570 | 600 | 0.5658 | -0.7001 | -1.1923 | 0.7100 | 0.4921 | -367.9675 | -335.5099 | -2.0030 | -2.0983 |
0.5568 | 0.1832 | 700 | 0.5678 | -1.3595 | -2.0541 | 0.7080 | 0.6947 | -454.1522 | -401.4426 | -1.8573 | -1.9509 |
0.5047 | 0.2094 | 800 | 0.5371 | -1.2892 | -1.9528 | 0.7240 | 0.6636 | -444.0185 | -394.4196 | -1.9046 | -1.9916 |
0.5053 | 0.2355 | 900 | 0.5388 | -1.5032 | -2.2420 | 0.7260 | 0.7388 | -472.9430 | -415.8180 | -1.8678 | -1.9410 |
0.5438 | 0.2617 | 1000 | 0.5343 | -1.5270 | -2.2670 | 0.7400 | 0.7400 | -475.4426 | -418.1995 | -1.8710 | -1.9472 |
0.595 | 0.2879 | 1100 | 0.5290 | -1.4070 | -2.1205 | 0.7370 | 0.7135 | -460.7867 | -406.1953 | -1.6012 | -1.6936 |
0.5628 | 0.3141 | 1200 | 0.5159 | -1.2461 | -1.9645 | 0.7430 | 0.7183 | -445.1867 | -390.1104 | -1.4961 | -1.5992 |
0.5334 | 0.3402 | 1300 | 0.5106 | -1.5548 | -2.3857 | 0.7410 | 0.8309 | -487.3135 | -420.9798 | -1.4528 | -1.5555 |
0.5324 | 0.3664 | 1400 | 0.5133 | -1.4606 | -2.3185 | 0.7300 | 0.8579 | -480.5880 | -411.5592 | -1.6116 | -1.6971 |
0.4708 | 0.3926 | 1500 | 0.5117 | -1.5267 | -2.4780 | 0.7460 | 0.9513 | -496.5367 | -418.1663 | -1.6359 | -1.7246 |
0.567 | 0.4188 | 1600 | 0.5051 | -1.5586 | -2.4438 | 0.7360 | 0.8851 | -493.1144 | -421.3598 | -1.5723 | -1.6655 |
0.5167 | 0.4449 | 1700 | 0.5078 | -1.8167 | -2.7043 | 0.7350 | 0.8876 | -519.1691 | -447.1625 | -1.5701 | -1.6681 |
0.4877 | 0.4711 | 1800 | 0.5059 | -1.6146 | -2.5493 | 0.7450 | 0.9347 | -503.6712 | -426.9594 | -1.5519 | -1.6424 |
0.4667 | 0.4973 | 1900 | 0.5021 | -1.8349 | -2.8150 | 0.7400 | 0.9801 | -530.2404 | -448.9849 | -1.3739 | -1.4795 |
0.4689 | 0.5234 | 2000 | 0.4990 | -2.4178 | -3.3735 | 0.7420 | 0.9557 | -586.0923 | -507.2770 | -1.1223 | -1.2484 |
0.5027 | 0.5496 | 2100 | 0.4956 | -2.3322 | -3.3229 | 0.7400 | 0.9908 | -581.0334 | -498.7141 | -1.1468 | -1.2691 |
0.4786 | 0.5758 | 2200 | 0.4934 | -2.2149 | -3.1817 | 0.7520 | 0.9668 | -566.9105 | -486.9841 | -1.1241 | -1.2533 |
0.4833 | 0.6020 | 2300 | 0.4928 | -2.4249 | -3.4764 | 0.7520 | 1.0515 | -596.3792 | -507.9904 | -1.0953 | -1.2229 |
0.4706 | 0.6281 | 2400 | 0.4934 | -2.3828 | -3.4151 | 0.7450 | 1.0323 | -590.2535 | -503.7771 | -1.0842 | -1.2077 |
0.5112 | 0.6543 | 2500 | 0.4928 | -2.3750 | -3.4387 | 0.7440 | 1.0637 | -592.6089 | -502.9985 | -1.1090 | -1.2373 |
0.4721 | 0.6805 | 2600 | 0.4987 | -2.3590 | -3.4594 | 0.7520 | 1.1004 | -594.6805 | -501.3951 | -1.1359 | -1.2595 |
0.4788 | 0.7066 | 2700 | 0.4924 | -2.6480 | -3.7521 | 0.7480 | 1.1041 | -623.9493 | -530.2946 | -1.0600 | -1.1861 |
0.4664 | 0.7328 | 2800 | 0.4912 | -2.7089 | -3.8484 | 0.7460 | 1.1395 | -633.5744 | -536.3848 | -1.0451 | -1.1713 |
0.499 | 0.7590 | 2900 | 0.4879 | -2.5879 | -3.6683 | 0.75 | 1.0804 | -615.5711 | -524.2902 | -1.0599 | -1.1874 |
0.4689 | 0.7852 | 3000 | 0.4874 | -2.5919 | -3.6653 | 0.7490 | 1.0734 | -615.2720 | -524.6861 | -1.0534 | -1.1823 |
0.498 | 0.8113 | 3100 | 0.4879 | -2.6250 | -3.7096 | 0.7510 | 1.0846 | -619.6946 | -527.9911 | -1.0592 | -1.1882 |
0.502 | 0.8375 | 3200 | 0.4876 | -2.5741 | -3.6583 | 0.7520 | 1.0842 | -614.5652 | -522.9026 | -1.0700 | -1.1979 |
0.5091 | 0.8637 | 3300 | 0.4877 | -2.5605 | -3.6430 | 0.75 | 1.0825 | -613.0379 | -521.5475 | -1.0677 | -1.1962 |
0.4601 | 0.8898 | 3400 | 0.4878 | -2.5736 | -3.6608 | 0.7490 | 1.0871 | -614.8157 | -522.8585 | -1.0632 | -1.1921 |
0.5339 | 0.9160 | 3500 | 0.4877 | -2.5733 | -3.6612 | 0.7520 | 1.0880 | -614.8598 | -522.8210 | -1.0661 | -1.1946 |
0.4651 | 0.9422 | 3600 | 0.4877 | -2.5730 | -3.6606 | 0.7510 | 1.0876 | -614.7937 | -522.7916 | -1.0655 | -1.1942 |
0.4743 | 0.9684 | 3700 | 0.4877 | -2.5733 | -3.6613 | 0.7510 | 1.0881 | -614.8724 | -522.8242 | -1.0678 | -1.1962 |
0.5193 | 0.9945 | 3800 | 0.4875 | -2.5729 | -3.6609 | 0.75 | 1.0880 | -614.8296 | -522.7888 | -1.0677 | -1.1961 |
Framework versions
- PEFT 0.7.1
- Transformers 4.40.1
- Pytorch 2.1.2
- Datasets 2.19.0
- Tokenizers 0.19.1
- Downloads last month
- 2
Model tree for junweiliao/zephyr-7b-dpo-qlora
Base model
mistralai/Mistral-7B-v0.1