Hyponatremia_L3_1000steps_1e8rate_05beta_DPO

This model is a fine-tuned version of tsavage68/Hyponatremia_L3_450steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6800
Rewards/chosen: 0.0111
Rewards/rejected: -0.0183
Rewards/accuracies: 0.6300
Rewards/margins: 0.0293
Logps/rejected: -39.4634
Logps/chosen: -22.6947
Logits/rejected: -1.0185
Logits/chosen: -0.9455

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-08
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.7087	0.2667	50	0.6904	0.0099	0.0022	0.5600	0.0077	-39.4225	-22.6970	-1.0181	-0.9449
0.7054	0.5333	100	0.6945	0.0150	0.0155	0.4700	-0.0005	-39.3959	-22.6868	-1.0188	-0.9457
0.6792	0.8	150	0.6916	0.0089	0.0036	0.5100	0.0052	-39.4196	-22.6991	-1.0191	-0.9458
0.6726	1.0667	200	0.6884	0.0071	-0.0042	0.5200	0.0114	-39.4353	-22.7026	-1.0195	-0.9464
0.6877	1.3333	250	0.6869	0.0113	-0.0039	0.5600	0.0152	-39.4347	-22.6943	-1.0183	-0.9452
0.6655	1.6	300	0.6882	0.0126	0.0002	0.5700	0.0124	-39.4264	-22.6915	-1.0193	-0.9460
0.6734	1.8667	350	0.6903	0.0156	0.0077	0.5400	0.0078	-39.4113	-22.6856	-1.0194	-0.9463
0.6759	2.1333	400	0.6839	0.0065	-0.0142	0.6000	0.0207	-39.4553	-22.7038	-1.0189	-0.9459
0.6775	2.4	450	0.6768	0.0146	-0.0209	0.6600	0.0355	-39.4687	-22.6875	-1.0184	-0.9453
0.692	2.6667	500	0.6800	0.0192	-0.0094	0.6000	0.0286	-39.4456	-22.6784	-1.0192	-0.9462
0.6805	2.9333	550	0.6807	0.0136	-0.0142	0.5700	0.0278	-39.4552	-22.6895	-1.0194	-0.9463
0.6711	3.2	600	0.6819	0.0058	-0.0191	0.6300	0.0248	-39.4650	-22.7053	-1.0191	-0.9460
0.6642	3.4667	650	0.6796	0.0124	-0.0172	0.5800	0.0296	-39.4612	-22.6920	-1.0190	-0.9458
0.6798	3.7333	700	0.6861	0.0179	0.0012	0.5500	0.0167	-39.4244	-22.6810	-1.0189	-0.9457
0.6845	4.0	750	0.6807	0.0102	-0.0177	0.6200	0.0278	-39.4621	-22.6965	-1.0185	-0.9454
0.6829	4.2667	800	0.6813	0.0097	-0.0170	0.6100	0.0267	-39.4609	-22.6974	-1.0185	-0.9454
0.6779	4.5333	850	0.6802	0.0106	-0.0182	0.6300	0.0288	-39.4632	-22.6955	-1.0185	-0.9455
0.6738	4.8	900	0.6800	0.0111	-0.0183	0.6300	0.0293	-39.4634	-22.6947	-1.0185	-0.9455
0.6731	5.0667	950	0.6800	0.0111	-0.0183	0.6300	0.0293	-39.4634	-22.6947	-1.0185	-0.9455
0.6674	5.3333	1000	0.6800	0.0111	-0.0183	0.6300	0.0293	-39.4634	-22.6947	-1.0185	-0.9455

Framework versions

Transformers 4.42.3
Pytorch 2.0.0+cu117
Datasets 2.20.0
Tokenizers 0.19.1

tsavage68
/

Hyponatremia_L3_1000steps_1e8rate_05beta_DPO

Hyponatremia_L3_1000steps_1e8rate_05beta_DPO

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tsavage68/Hyponatremia_L3_1000steps_1e8rate_05beta_DPO

Evaluation results