zephyr-7b-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.4873
Rewards/chosen: -2.9667
Rewards/rejected: -4.1000
Rewards/accuracies: 0.7445
Rewards/margins: 1.1333
Logps/rejected: -654.6072
Logps/chosen: -561.3217
Logits/rejected: -0.9450
Logits/chosen: -1.0724

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Logits/chosen	Logits/rejected	Logps/chosen	Logps/rejected	Validation Loss	Rewards/accuracies	Rewards/chosen	Rewards/margins	Rewards/rejected
0.6819	0.03	100	-2.0959	-1.9565	-259.6472	-241.9029	0.6822	0.6545	0.0500	0.0230	0.0271
0.6548	0.05	200	0.6500	-0.1489	-0.2515	0.6780	0.1027	-269.7628	-279.5373	-1.9329	-2.0695
0.6084	0.08	300	0.6213	-0.2956	-0.4998	0.6810	0.2042	-294.5921	-294.2169	-1.8771	-2.0114
0.6237	0.1	400	0.6039	-0.4538	-0.7401	0.6935	0.2863	-318.6170	-310.0349	-1.8367	-1.9656
0.5534	0.13	500	0.5692	-0.9154	-1.3927	0.7050	0.4773	-383.8828	-356.1946	-1.5403	-1.6712
0.5613	0.16	600	0.5659	-0.8123	-1.3218	0.7025	0.5095	-376.7896	-345.8830	-1.3701	-1.5049
0.5139	0.18	700	0.5572	-2.6368	-3.4670	0.7145	0.8302	-591.3087	-528.3278	-0.8924	-1.0174
0.5184	0.21	800	0.5374	-1.4908	-2.1870	0.7160	0.6962	-463.3091	-413.7339	-1.1141	-1.2460
0.5211	0.24	900	0.5332	-2.5430	-3.3947	0.7180	0.8518	-584.0806	-518.9495	-0.8116	-0.9341
0.5553	0.26	1000	0.5178	-2.1745	-3.0424	0.7315	0.8679	-548.8491	-482.0993	-0.8557	-0.9813
0.5994	0.29	1100	0.5207	-2.5002	-3.3276	0.7300	0.8275	-577.3698	-514.6677	-0.7615	-0.8896
0.5976	0.31	1200	0.5098	-2.1833	-2.9905	0.7365	0.8072	-543.6604	-482.9834	-0.8350	-0.9596
0.5237	0.34	1300	0.5166	-3.0973	-4.1628	0.7350	1.0654	-660.8850	-574.3862	-0.7072	-0.8259
0.516	0.37	1400	0.5108	-2.1009	-3.0663	0.7350	0.9654	-551.2367	-474.7425	-0.7865	-0.9128
0.4593	0.39	1500	0.5174	-2.3167	-3.4254	0.7305	1.1088	-587.1506	-496.3185	-0.8903	-1.0211
0.5545	0.42	1600	0.5032	-2.9938	-4.0820	0.7370	1.0882	-652.8123	-564.0355	-0.8801	-1.0082
0.5425	0.44	1700	0.4996	-3.3496	-4.4061	0.7405	1.0565	-685.2187	-599.6096	-0.8382	-0.9686
0.4825	0.47	1800	0.5037	-3.0446	-4.1288	0.7380	1.0842	-657.4884	-569.1091	-0.8738	-1.0006
0.4455	0.5	1900	0.4962	-3.0223	-4.1482	0.7420	1.1259	-659.4305	-566.8840	-0.8910	-1.0214
0.4817	0.52	2000	0.4974	-3.5987	-4.6648	0.7470	1.0660	-711.0853	-624.5250	-0.8139	-0.9428
0.5079	0.55	2100	0.4923	-3.1751	-4.2293	0.7520	1.0542	-667.5426	-582.1657	-0.8739	-1.0031
0.477	0.58	2200	0.4897	-2.6127	-3.5713	0.7410	0.9587	-601.7402	-525.9182	-0.9567	-1.0880
0.4829	0.6	2300	0.4887	-2.9530	-4.0954	0.7485	1.1424	-654.1511	-559.9558	-0.9032	-1.0313
0.4752	0.63	2400	0.4909	-3.1480	-4.2815	0.7445	1.1335	-672.7583	-579.4506	-0.8495	-0.9765
0.5249	0.65	2500	0.4891	-3.0936	-4.2029	0.7445	1.1093	-664.8962	-574.0093	-0.9136	-1.0435
0.4596	0.68	2600	0.4939	-2.9492	-4.0985	0.7400	1.1493	-654.4570	-559.5698	-0.9264	-1.0549
0.5152	0.71	2700	0.4922	-3.0197	-4.1572	0.7440	1.1375	-660.3236	-566.6193	-0.9249	-1.0527
0.4518	0.73	2800	0.4908	-3.0666	-4.2342	0.7415	1.1676	-668.0294	-571.3138	-0.9260	-1.0535
0.5018	0.76	2900	0.4877	-3.0977	-4.2382	0.7465	1.1405	-668.4285	-574.4260	-0.9320	-1.0595
0.4592	0.79	3000	0.4873	-2.9934	-4.1134	0.7460	1.1200	-655.9471	-563.9877	-0.9510	-1.0788
0.4905	0.81	3100	0.4878	-2.9825	-4.1198	0.7430	1.1373	-656.5853	-562.9043	-0.9465	-1.0741
0.485	0.84	3200	0.4874	-2.9459	-4.0754	0.7455	1.1296	-652.1517	-559.2400	-0.9531	-1.0807
0.5157	0.86	3300	0.4874	-2.9550	-4.0838	0.7445	1.1289	-652.9912	-560.1489	-0.9481	-1.0755
0.4474	0.89	3400	0.4871	-2.9699	-4.1019	0.7435	1.1321	-654.8017	-561.6381	-0.9499	-1.0773
0.5379	0.92	3500	0.4874	-2.9663	-4.0989	0.7430	1.1326	-654.5006	-561.2808	-0.9468	-1.0742
0.464	0.94	3600	0.4874	-2.9638	-4.0967	0.7425	1.1329	-654.2791	-561.0286	-0.9475	-1.0748
0.4729	0.97	3700	0.4873	-2.9666	-4.0999	0.7445	1.1333	-654.6014	-561.3129	-0.9495	-1.0770
0.5017	0.99	3800	0.4873	-2.9667	-4.1000	0.7445	1.1333	-654.6072	-561.3217	-0.9450	-1.0724

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.2.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

geonmin-kim
/

zephyr-7b-dpo-qlora

zephyr-7b-dpo-qlora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for geonmin-kim/zephyr-7b-dpo-qlora

Evaluation results