zephyr-dpo-qlora-uf-ours-uffull-5e-6

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF and the generation/UFfull2 datasets. It achieves the following results on the evaluation set:

Loss: 0.4948
Rewards/chosen: -1.7888
Rewards/rejected: -2.8835
Rewards/accuracies: 0.7485
Rewards/margins: 1.0946
Rewards/margins Max: 3.5873
Rewards/margins Min: -0.9701
Rewards/margins Std: 1.5436
Logps/rejected: -554.2000
Logps/chosen: -463.3372
Logits/rejected: -1.5538
Logits/chosen: -1.6206

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 2
total_train_batch_size: 16
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/margins Max	Rewards/margins Min	Rewards/margins Std	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6903	0.02	100	0.6905	0.0096	0.0042	0.6635	0.0055	0.0279	-0.0135	0.0138	-265.4348	-283.4918	-2.7667	-2.8015
0.6668	0.05	200	0.6714	0.0249	-0.0232	0.6645	0.0481	0.2299	-0.1105	0.1130	-268.1768	-281.9665	-2.7343	-2.7676
0.6136	0.07	300	0.6388	-0.2723	-0.4201	0.6695	0.1478	0.6956	-0.3145	0.3388	-307.8617	-311.6826	-2.6777	-2.7086
0.6224	0.1	400	0.6072	-0.4408	-0.7266	0.6825	0.2858	1.2193	-0.5526	0.5951	-338.5125	-328.5356	-2.5218	-2.5541
0.5913	0.12	500	0.5700	-0.6299	-1.0928	0.6975	0.4629	1.7719	-0.6554	0.8141	-375.1356	-347.4472	-2.1793	-2.2226
0.5721	0.14	600	0.5595	-1.1081	-1.7353	0.7145	0.6271	2.2934	-0.8628	1.0597	-439.3786	-395.2698	-2.0549	-2.1036
0.4888	0.17	700	0.5546	-1.4460	-2.1425	0.7085	0.6965	2.5873	-0.9396	1.1811	-480.1024	-429.0589	-1.7782	-1.8362
0.4774	0.19	800	0.5258	-1.2110	-1.9801	0.7270	0.7691	2.5889	-0.8329	1.1591	-463.8646	-405.5573	-1.9074	-1.9645
0.521	0.22	900	0.5286	-1.4043	-2.2106	0.7355	0.8063	2.8030	-0.8890	1.2406	-486.9130	-424.8805	-1.5390	-1.5999
0.4871	0.24	1000	0.5354	-1.0617	-1.8924	0.7250	0.8307	2.9996	-0.8983	1.3137	-455.0902	-390.6243	-1.7795	-1.8273
0.5574	0.26	1100	0.5379	-1.2560	-2.0556	0.7205	0.7996	3.0463	-0.8879	1.3085	-471.4182	-410.0581	-1.6403	-1.6951
0.5017	0.29	1200	0.5261	-1.3320	-2.1724	0.7295	0.8404	2.9985	-0.8951	1.3031	-483.0894	-417.6535	-1.7025	-1.7570
0.4478	0.31	1300	0.5277	-1.7254	-2.6499	0.7230	0.9245	3.2834	-1.0237	1.4394	-530.8426	-456.9910	-1.7244	-1.7779
0.4919	0.34	1400	0.5189	-1.1742	-2.0426	0.7365	0.8684	3.0337	-0.9052	1.3302	-470.1158	-401.8751	-1.5533	-1.6223
0.4792	0.36	1500	0.5205	-1.3947	-2.3310	0.7340	0.9364	3.1265	-0.9863	1.3913	-498.9553	-423.9220	-1.6972	-1.7596
0.4952	0.38	1600	0.5316	-1.8397	-2.8176	0.7290	0.9779	3.2675	-1.0997	1.4769	-547.6121	-468.4282	-1.8293	-1.8827
0.5084	0.41	1700	0.5285	-2.4336	-3.4484	0.7295	1.0147	3.4046	-1.1112	1.5199	-610.6892	-527.8181	-1.5473	-1.6112
0.4676	0.43	1800	0.5162	-1.8360	-2.7043	0.7370	0.8683	2.8969	-0.9280	1.2953	-536.2840	-468.0518	-1.5045	-1.5680
0.4588	0.45	1900	0.5073	-1.5345	-2.4614	0.7435	0.9269	3.0227	-0.9141	1.3341	-511.9908	-437.9078	-1.3109	-1.3855
0.4826	0.48	2000	0.5104	-1.6277	-2.6050	0.7385	0.9773	3.2595	-0.9829	1.4282	-526.3553	-447.2241	-1.3208	-1.3956
0.4925	0.5	2100	0.5079	-1.6078	-2.5256	0.7355	0.9178	2.9879	-0.9518	1.3324	-518.4150	-445.2356	-1.5277	-1.5931
0.546	0.53	2200	0.5100	-1.7097	-2.6882	0.7370	0.9785	3.1492	-1.0011	1.4117	-534.6687	-455.4216	-1.4247	-1.4938
0.4958	0.55	2300	0.5047	-1.4824	-2.3935	0.7385	0.9111	2.9984	-0.8454	1.2951	-505.2043	-432.6925	-1.6758	-1.7328
0.4757	0.57	2400	0.5021	-1.6699	-2.6304	0.7380	0.9605	3.1590	-0.8924	1.3656	-528.8900	-451.4436	-1.4670	-1.5347
0.4539	0.6	2500	0.5025	-1.7424	-2.7890	0.7400	1.0466	3.4316	-1.0034	1.5001	-544.7556	-458.6970	-1.5551	-1.6231
0.4612	0.62	2600	0.4991	-1.7503	-2.8124	0.7415	1.0621	3.4721	-0.9695	1.5041	-547.0907	-459.4844	-1.4927	-1.5622
0.5267	0.65	2700	0.4989	-1.5988	-2.5869	0.7410	0.9881	3.2210	-0.9401	1.4114	-524.5454	-444.3344	-1.5476	-1.6161
0.4999	0.67	2800	0.4974	-1.6001	-2.5954	0.7470	0.9953	3.2272	-0.8964	1.3973	-525.3958	-444.4690	-1.5260	-1.5935
0.4589	0.69	2900	0.4977	-1.7829	-2.8625	0.7415	1.0796	3.5812	-0.9488	1.5304	-552.1008	-462.7464	-1.5484	-1.6154
0.4433	0.72	3000	0.4995	-1.7820	-2.8827	0.7395	1.1007	3.6468	-0.9945	1.5727	-554.1236	-462.6560	-1.5922	-1.6589
0.4908	0.74	3100	0.4970	-1.7323	-2.7993	0.7415	1.0669	3.5268	-0.9553	1.5148	-545.7810	-457.6894	-1.6165	-1.6807
0.4325	0.77	3200	0.4972	-1.3958	-2.4076	0.75	1.0117	3.3475	-0.9045	1.4383	-506.6104	-424.0385	-1.6999	-1.7600
0.4645	0.79	3300	0.4970	-1.7218	-2.8037	0.7485	1.0819	3.5295	-0.9807	1.5290	-546.2211	-456.6324	-1.5845	-1.6505
0.4612	0.81	3400	0.4980	-1.8787	-2.9919	0.7445	1.1132	3.6640	-1.0013	1.5776	-565.0459	-472.3241	-1.4980	-1.5678
0.4023	0.84	3500	0.4987	-2.0641	-3.1949	0.7410	1.1308	3.7331	-1.0134	1.6034	-585.3400	-490.8608	-1.4923	-1.5625
0.4564	0.86	3600	0.4952	-1.8890	-2.9834	0.7445	1.0943	3.5913	-0.9690	1.5435	-564.1885	-473.3587	-1.5268	-1.5955
0.4337	0.89	3700	0.4948	-1.7899	-2.8791	0.7480	1.0892	3.5650	-0.9671	1.5348	-553.7646	-463.4457	-1.5501	-1.6174
0.4687	0.91	3800	0.4949	-1.7971	-2.8908	0.7475	1.0937	3.5845	-0.9702	1.5427	-554.9319	-464.1627	-1.5573	-1.6238
0.4624	0.93	3900	0.4946	-1.7588	-2.8495	0.7480	1.0908	3.5789	-0.9633	1.5386	-550.8040	-460.3306	-1.5625	-1.6288
0.4744	0.96	4000	0.4948	-1.7812	-2.8753	0.7470	1.0941	3.5851	-0.9685	1.5428	-553.3815	-462.5721	-1.5573	-1.6239
0.4294	0.98	4100	0.4950	-1.7859	-2.8799	0.7480	1.0940	3.5863	-0.9706	1.5436	-553.8444	-463.0418	-1.5527	-1.6196

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

just1nseo
/

zephyr-dpo-qlora-uf-ours-uffull-5e-6

zephyr-dpo-qlora-uf-ours-uffull-5e-6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for just1nseo/zephyr-dpo-qlora-uf-ours-uffull-5e-6

Evaluation results