Hyponatremia_L3_500steps_1e8rate_01beta_DPO

This model is a fine-tuned version of tsavage68/Hyponatremia_L3_450steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6962	0.2667	50	0.6922	0.0018	-0.0003	0.5600	0.0021	-39.4298	-22.6992	-1.0181	-0.9449
0.6954	0.5333	100	0.6954	0.0009	0.0052	0.4000	-0.0043	-39.3750	-22.7083	-1.0190	-0.9459
0.6919	0.8	150	0.6910	0.0046	0.0002	0.5600	0.0044	-39.4246	-22.6710	-1.0191	-0.9461
0.6898	1.0667	200	0.6922	0.0047	0.0027	0.5400	0.0020	-39.3995	-22.6693	-1.0194	-0.9462
0.6911	1.3333	250	0.6935	0.0025	0.0031	0.5200	-0.0006	-39.3958	-22.6923	-1.0189	-0.9458
0.6875	1.6	300	0.6921	0.0022	0.0000	0.5400	0.0022	-39.4264	-22.6947	-1.0188	-0.9457
0.6892	1.8667	350	0.6913	0.0037	-0.0001	0.5900	0.0038	-39.4283	-22.6799	-1.0196	-0.9464
0.6915	2.1333	400	0.6904	0.0033	-0.0024	0.5800	0.0057	-39.4505	-22.6834	-1.0193	-0.9460
0.6894	2.4	450	0.6902	0.0036	-0.0026	0.5900	0.0061	-39.4524	-22.6813	-1.0193	-0.9460
0.6903	2.6667	500	0.6902	0.0036	-0.0026	0.5900	0.0061	-39.4524	-22.6813	-1.0193	-0.9460