zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.4877
Rewards/chosen: -2.5728
Rewards/rejected: -3.6607
Rewards/accuracies: 0.7510
Rewards/margins: 1.0879
Logps/rejected: -614.8131
Logps/chosen: -522.7775
Logits/rejected: -1.0677
Logits/chosen: -1.1961

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 1
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 8
total_train_batch_size: 16
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6864	0.0262	100	0.6863	0.0245	0.0109	0.6560	0.0137	-247.6524	-263.0416	-2.2125	-2.3103
0.6536	0.0523	200	0.6562	-0.0059	-0.0864	0.6680	0.0805	-257.3772	-266.0850	-2.1777	-2.2752
0.6047	0.0785	300	0.6286	-0.1438	-0.3204	0.6660	0.1767	-280.7805	-279.8720	-2.1601	-2.2542
0.6299	0.1047	400	0.6084	-0.3193	-0.5734	0.6850	0.2541	-306.0758	-297.4266	-2.0907	-2.1881
0.5709	0.1309	500	0.5789	-0.7471	-1.1867	0.7000	0.4396	-367.4122	-340.2105	-2.0692	-2.1605
0.5488	0.1570	600	0.5658	-0.7001	-1.1923	0.7100	0.4921	-367.9675	-335.5099	-2.0030	-2.0983
0.5568	0.1832	700	0.5678	-1.3595	-2.0541	0.7080	0.6947	-454.1522	-401.4426	-1.8573	-1.9509
0.5047	0.2094	800	0.5371	-1.2892	-1.9528	0.7240	0.6636	-444.0185	-394.4196	-1.9046	-1.9916
0.5053	0.2355	900	0.5388	-1.5032	-2.2420	0.7260	0.7388	-472.9430	-415.8180	-1.8678	-1.9410
0.5438	0.2617	1000	0.5343	-1.5270	-2.2670	0.7400	0.7400	-475.4426	-418.1995	-1.8710	-1.9472
0.595	0.2879	1100	0.5290	-1.4070	-2.1205	0.7370	0.7135	-460.7867	-406.1953	-1.6012	-1.6936
0.5628	0.3141	1200	0.5159	-1.2461	-1.9645	0.7430	0.7183	-445.1867	-390.1104	-1.4961	-1.5992
0.5334	0.3402	1300	0.5106	-1.5548	-2.3857	0.7410	0.8309	-487.3135	-420.9798	-1.4528	-1.5555
0.5324	0.3664	1400	0.5133	-1.4606	-2.3185	0.7300	0.8579	-480.5880	-411.5592	-1.6116	-1.6971
0.4708	0.3926	1500	0.5117	-1.5267	-2.4780	0.7460	0.9513	-496.5367	-418.1663	-1.6359	-1.7246
0.567	0.4188	1600	0.5051	-1.5586	-2.4438	0.7360	0.8851	-493.1144	-421.3598	-1.5723	-1.6655
0.5167	0.4449	1700	0.5078	-1.8167	-2.7043	0.7350	0.8876	-519.1691	-447.1625	-1.5701	-1.6681
0.4877	0.4711	1800	0.5059	-1.6146	-2.5493	0.7450	0.9347	-503.6712	-426.9594	-1.5519	-1.6424
0.4667	0.4973	1900	0.5021	-1.8349	-2.8150	0.7400	0.9801	-530.2404	-448.9849	-1.3739	-1.4795
0.4689	0.5234	2000	0.4990	-2.4178	-3.3735	0.7420	0.9557	-586.0923	-507.2770	-1.1223	-1.2484
0.5027	0.5496	2100	0.4956	-2.3322	-3.3229	0.7400	0.9908	-581.0334	-498.7141	-1.1468	-1.2691
0.4786	0.5758	2200	0.4934	-2.2149	-3.1817	0.7520	0.9668	-566.9105	-486.9841	-1.1241	-1.2533
0.4833	0.6020	2300	0.4928	-2.4249	-3.4764	0.7520	1.0515	-596.3792	-507.9904	-1.0953	-1.2229
0.4706	0.6281	2400	0.4934	-2.3828	-3.4151	0.7450	1.0323	-590.2535	-503.7771	-1.0842	-1.2077
0.5112	0.6543	2500	0.4928	-2.3750	-3.4387	0.7440	1.0637	-592.6089	-502.9985	-1.1090	-1.2373
0.4721	0.6805	2600	0.4987	-2.3590	-3.4594	0.7520	1.1004	-594.6805	-501.3951	-1.1359	-1.2595
0.4788	0.7066	2700	0.4924	-2.6480	-3.7521	0.7480	1.1041	-623.9493	-530.2946	-1.0600	-1.1861
0.4664	0.7328	2800	0.4912	-2.7089	-3.8484	0.7460	1.1395	-633.5744	-536.3848	-1.0451	-1.1713
0.499	0.7590	2900	0.4879	-2.5879	-3.6683	0.75	1.0804	-615.5711	-524.2902	-1.0599	-1.1874
0.4689	0.7852	3000	0.4874	-2.5919	-3.6653	0.7490	1.0734	-615.2720	-524.6861	-1.0534	-1.1823
0.498	0.8113	3100	0.4879	-2.6250	-3.7096	0.7510	1.0846	-619.6946	-527.9911	-1.0592	-1.1882
0.502	0.8375	3200	0.4876	-2.5741	-3.6583	0.7520	1.0842	-614.5652	-522.9026	-1.0700	-1.1979
0.5091	0.8637	3300	0.4877	-2.5605	-3.6430	0.75	1.0825	-613.0379	-521.5475	-1.0677	-1.1962
0.4601	0.8898	3400	0.4878	-2.5736	-3.6608	0.7490	1.0871	-614.8157	-522.8585	-1.0632	-1.1921
0.5339	0.9160	3500	0.4877	-2.5733	-3.6612	0.7520	1.0880	-614.8598	-522.8210	-1.0661	-1.1946
0.4651	0.9422	3600	0.4877	-2.5730	-3.6606	0.7510	1.0876	-614.7937	-522.7916	-1.0655	-1.1942
0.4743	0.9684	3700	0.4877	-2.5733	-3.6613	0.7510	1.0881	-614.8724	-522.8242	-1.0678	-1.1962
0.5193	0.9945	3800	0.4875	-2.5729	-3.6609	0.75	1.0880	-614.8296	-522.7888	-1.0677	-1.1961

Framework versions

PEFT 0.7.1
Transformers 4.40.1
Pytorch 2.1.2
Datasets 2.19.0
Tokenizers 0.19.1

junweiliao
/

zephyr-7b-dpo-qlora

zephyr-7b-dpo-qlora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for junweiliao/zephyr-7b-dpo-qlora

Dataset used to train junweiliao/zephyr-7b-dpo-qlora

Evaluation results