dpo_with_se

This model is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.6194
Rewards/chosen: -0.6699
Rewards/rejected: -1.1107
Rewards/accuracies: 0.6458
Rewards/margins: 0.4407
Logps/rejected: -422.9081
Logps/chosen: -458.9963
Logits/rejected: 0.0509
Logits/chosen: 0.1892

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
lr_scheduler_warmup_steps: 100
num_epochs: 2
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.7121	0.0622	50	0.7078	1.9859	1.9118	0.5694	0.0741	-392.6837	-432.4385	0.1883	0.3317
0.672	0.1244	100	0.6718	0.4213	0.2008	0.5972	0.2204	-409.7933	-448.0844	0.1330	0.2722
0.6803	0.1866	150	0.6633	1.2004	0.9074	0.6215	0.2930	-402.7275	-440.2932	0.2565	0.3917
0.6816	0.2488	200	0.6535	-0.2285	-0.4811	0.5938	0.2526	-416.6123	-454.5817	0.1335	0.2706
0.6719	0.3109	250	0.6768	-0.0803	-0.2830	0.6007	0.2027	-414.6320	-453.1003	0.1071	0.2455
0.642	0.3731	300	0.6402	0.3405	0.0226	0.6146	0.3179	-411.5756	-448.8922	0.0864	0.2271
0.6675	0.4353	350	0.6472	0.7586	0.4677	0.6007	0.2909	-407.1244	-444.7109	0.1382	0.2779
0.6581	0.4975	400	0.6502	-0.0310	-0.3059	0.6181	0.2749	-414.8607	-452.6067	0.0326	0.1770
0.6155	0.5597	450	0.6416	0.0254	-0.2895	0.625	0.3149	-414.6964	-452.0428	0.1102	0.2490
0.6438	0.6219	500	0.6383	-0.2805	-0.6002	0.625	0.3197	-417.8031	-455.1015	0.0799	0.2196
0.6069	0.6841	550	0.6360	-0.6526	-0.9456	0.6007	0.2930	-421.2573	-458.8233	0.1079	0.2462
0.6227	0.7463	600	0.6349	-0.0705	-0.3659	0.6215	0.2954	-415.4609	-453.0020	0.0381	0.1807
0.6473	0.8085	650	0.6331	-0.3187	-0.6771	0.6528	0.3584	-418.5728	-455.4844	0.1406	0.2776
0.6259	0.8706	700	0.6295	-0.4256	-0.7399	0.6111	0.3143	-419.2006	-456.5528	0.0986	0.2391
0.6572	0.9328	750	0.6389	-0.5969	-0.8936	0.6007	0.2967	-420.7374	-458.2657	0.0726	0.2120
0.63	0.9950	800	0.6310	-0.2243	-0.5516	0.6285	0.3274	-417.3179	-454.5398	0.1026	0.2406
0.4431	1.0572	850	0.6238	-0.3325	-0.7169	0.6632	0.3844	-418.9702	-455.6217	0.0604	0.1992
0.47	1.1194	900	0.6286	-0.6589	-1.1143	0.6597	0.4554	-422.9441	-458.8861	-0.0269	0.1154
0.4436	1.1816	950	0.6252	-0.6243	-1.0270	0.6354	0.4027	-422.0717	-458.5404	0.0062	0.1465
0.4483	1.2438	1000	0.6238	-0.6325	-1.0514	0.6319	0.4189	-422.3156	-458.6222	0.0434	0.1813
0.4568	1.3060	1050	0.6297	-0.9557	-1.3457	0.6285	0.3900	-425.2583	-461.8539	0.1563	0.2901
0.4555	1.3682	1100	0.6311	-0.5825	-1.0012	0.6319	0.4188	-421.8140	-458.1216	0.0905	0.2271
0.4744	1.4303	1150	0.6248	-0.5365	-0.9374	0.6424	0.4008	-421.1751	-457.6623	0.0472	0.1861
0.4245	1.4925	1200	0.6255	-0.6457	-1.0579	0.6424	0.4122	-422.3806	-458.7540	-0.0423	0.0997
0.4767	1.5547	1250	0.6294	-0.7333	-1.1519	0.6319	0.4185	-423.3202	-459.6304	0.1300	0.2652
0.4714	1.6169	1300	0.6253	-0.8128	-1.2388	0.6493	0.4261	-424.1896	-460.4245	0.0397	0.1788
0.4336	1.6791	1350	0.6229	-0.7654	-1.2064	0.6424	0.4410	-423.8654	-459.9506	0.1234	0.2587
0.4791	1.7413	1400	0.6216	-0.7578	-1.2069	0.6389	0.4492	-423.8710	-459.8747	0.0547	0.1931
0.439	1.8035	1450	0.6204	-0.7469	-1.1972	0.6493	0.4502	-423.7731	-459.7664	0.0661	0.2040
0.4419	1.8657	1500	0.6194	-0.6699	-1.1107	0.6458	0.4407	-422.9081	-458.9963	0.0509	0.1892
0.4593	1.9279	1550	0.6214	-0.6895	-1.1228	0.6528	0.4333	-423.0291	-459.1917	0.0628	0.2005
0.4444	1.9900	1600	0.6229	-0.6827	-1.1246	0.6667	0.4419	-423.0472	-459.1237	0.0863	0.2226

Framework versions

PEFT 0.11.2.dev0
Transformers 4.41.2
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

ernestoBocini
/

Phi3-mini-DPO-Tuned

dpo_with_se

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ernestoBocini/Phi3-mini-DPO-Tuned

Space using ernestoBocini/Phi3-mini-DPO-Tuned 1

Evaluation results