Hyponatremia_L3_1000steps_1e5rate_01beta_DPO

This model is a fine-tuned version of tsavage68/Hyponatremia_L3_450steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0000
Rewards/chosen: 1.0768
Rewards/rejected: -15.3439
Rewards/accuracies: 1.0
Rewards/margins: 16.4206
Logps/rejected: -192.8655
Logps/chosen: -11.9493
Logits/rejected: -1.0760
Logits/chosen: -0.9787

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Epoch	Step	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.2667	50	0.9553	-10.8575	1.0	11.8128	-148.0022	-13.1641	-1.0364	-0.9611
0.5333	100	0.9519	-12.4385	1.0	13.3903	-163.8113	-13.1981	-1.0526	-0.9728
0.8	150	0.9832	-13.0797	1.0	14.0629	-170.2236	-12.8844	-1.0616	-0.9786
1.0667	200	0.9920	-13.5014	1.0	14.4934	-174.4411	-12.7967	-1.0686	-0.9825
1.3333	250	1.0027	-13.8298	1.0	14.8325	-177.7250	-12.6903	-1.0703	-0.9822
1.6	300	1.0142	-14.0854	1.0	15.0996	-180.2808	-12.5749	-1.0721	-0.9818
1.8667	350	1.0305	-14.3255	1.0	15.3560	-182.6821	-12.4120	-1.0734	-0.9816
2.1333	400	1.0373	-14.5462	1.0	15.5835	-184.8884	-12.3434	-1.0740	-0.9810
2.4	450	1.0509	-14.7386	1.0	15.7895	-186.8133	-12.2083	-1.0751	-0.9810
2.6667	500	1.0573	-14.8986	1.0	15.9560	-188.4131	-12.1435	-1.0767	-0.9816
2.9333	550	1.0640	-15.0362	1.0	16.1002	-189.7889	-12.0765	-1.0754	-0.9801
3.2	600	1.0681	-15.1438	1.0	16.2119	-190.8647	-12.0355	-1.0755	-0.9793
3.4667	650	1.0702	-15.2094	1.0	16.2796	-191.5211	-12.0146	-1.0752	-0.9782
3.7333	700	1.0749	-15.2717	1.0	16.3466	-192.1442	-11.9678	-1.0751	-0.9777
4.0	750	1.0742	-15.3088	1.0	16.3831	-192.5153	-11.9746	-1.0760	-0.9782
4.2667	800	1.0784	-15.3235	1.0	16.4019	-192.6623	-11.9330	-1.0748	-0.9774
4.5333	850	1.0743	-15.3432	1.0	16.4175	-192.8588	-11.9742	-1.0748	-0.9772
4.8	900	1.0767	-15.3361	1.0	16.4128	-192.7881	-11.9501	-1.0756	-0.9780
5.0667	950	1.0768	-15.3439	1.0	16.4206	-192.8655	-11.9493	-1.0760	-0.9787
5.3333	1000	1.0768	-15.3439	1.0	16.4206	-192.8655	-11.9493	-1.0760	-0.9787

Framework versions

Transformers 4.42.3
Pytorch 2.0.0+cu117
Datasets 2.20.0
Tokenizers 0.19.1

tsavage68
/

Hyponatremia_L3_1000steps_1e5rate_01beta_DPO

Hyponatremia_L3_1000steps_1e5rate_01beta_DPO

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tsavage68/Hyponatremia_L3_1000steps_1e5rate_01beta_DPO

Evaluation results