Hyponatremia_L3_1000steps_1e5rate_05beta_DPO

This model is a fine-tuned version of tsavage68/Hyponatremia_L3_450steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0000
Rewards/chosen: 3.1005
Rewards/rejected: -14.4818
Rewards/accuracies: 1.0
Rewards/margins: 17.5823
Logps/rejected: -68.3904
Logps/chosen: -16.5158
Logits/rejected: -1.0107
Logits/chosen: -0.9178

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Epoch	Step	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.2667	50	2.6888	-10.9218	1.0	13.6106	-61.2704	-17.3392	-1.0060	-0.9176
0.5333	100	2.8123	-12.0070	1.0	14.8193	-63.4408	-17.0922	-1.0062	-0.9166
0.8	150	2.8718	-12.6309	1.0	15.5027	-64.6887	-16.9733	-1.0077	-0.9178
1.0667	200	2.9222	-12.9997	1.0	15.9220	-65.4263	-16.8724	-1.0083	-0.9180
1.3333	250	2.9587	-13.2786	1.0	16.2372	-65.9839	-16.7994	-1.0085	-0.9179
1.6	300	2.9748	-13.5117	1.0	16.4865	-66.4503	-16.7673	-1.0094	-0.9185
1.8667	350	3.0038	-13.7311	1.0	16.7350	-66.8891	-16.7092	-1.0097	-0.9181
2.1333	400	3.0287	-13.8707	1.0	16.8994	-67.1683	-16.6595	-1.0097	-0.9178
2.4	450	3.0555	-14.0219	1.0	17.0774	-67.4707	-16.6059	-1.0096	-0.9174
2.6667	500	3.0689	-14.1391	1.0	17.2081	-67.7051	-16.5790	-1.0110	-0.9186
2.9333	550	3.0728	-14.2357	1.0	17.3085	-67.8981	-16.5711	-1.0101	-0.9176
3.2	600	3.0755	-14.3397	1.0	17.4152	-68.1062	-16.5658	-1.0104	-0.9180
3.4667	650	3.0977	-14.3908	1.0	17.4884	-68.2083	-16.5214	-1.0106	-0.9180
3.7333	700	3.1035	-14.4417	1.0	17.5452	-68.3102	-16.5099	-1.0117	-0.9189
4.0	750	3.0881	-14.4574	1.0	17.5455	-68.3416	-16.5406	-1.0099	-0.9170
4.2667	800	3.1048	-14.4756	1.0	17.5804	-68.3780	-16.5072	-1.0102	-0.9176
4.5333	850	3.0963	-14.4856	1.0	17.5819	-68.3980	-16.5242	-1.0096	-0.9168
4.8	900	3.1097	-14.4788	1.0	17.5885	-68.3844	-16.4973	-1.0104	-0.9175
5.0667	950	3.1005	-14.4818	1.0	17.5823	-68.3904	-16.5158	-1.0107	-0.9178
5.3333	1000	3.1005	-14.4818	1.0	17.5823	-68.3904	-16.5158	-1.0107	-0.9178

Framework versions

Transformers 4.42.3
Pytorch 2.0.0+cu117
Datasets 2.20.0
Tokenizers 0.19.1

tsavage68
/

Hyponatremia_L3_1000steps_1e5rate_05beta_DPO

Hyponatremia_L3_1000steps_1e5rate_05beta_DPO

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tsavage68/Hyponatremia_L3_1000steps_1e5rate_05beta_DPO

Evaluation results