OpenELM-1_1B-DPO-full-2

This model is a fine-tuned version of data/OpenELM-1_1B-SFT-2 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.7945
Rewards/chosen: -8.3125
Rewards/rejected: -10.4375
Rewards/accuracies: 0.7324
Rewards/margins: 2.1406
Logps/rejected: -1336.0
Logps/chosen: -1144.0
Logits/rejected: 5.5
Logits/chosen: 3.5938

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6007	0.1047	100	0.6140	-1.2344	-1.5781	0.6562	0.3418	-444.0	-438.0	-8.5	-8.8125
0.591	0.2093	200	0.6025	-1.9297	-2.4688	0.6895	0.5312	-532.0	-508.0	-6.9375	-7.5312
0.6351	0.3140	300	0.5962	-2.2344	-2.6875	0.6875	0.4512	-556.0	-540.0	-4.9062	-5.7812
0.6031	0.4186	400	0.5900	-1.7109	-2.2812	0.6875	0.5625	-512.0	-486.0	-6.25	-7.2188
0.5813	0.5233	500	0.5824	-2.25	-2.8125	0.7051	0.5547	-568.0	-540.0	-3.6406	-4.8125
0.5376	0.6279	600	0.5624	-2.625	-3.3281	0.7012	0.7109	-620.0	-576.0	2.4219	0.9258
0.5582	0.7326	700	0.5655	-3.2812	-4.0938	0.7051	0.8008	-696.0	-644.0	-0.3281	-1.7891
0.5437	0.8373	800	0.5704	-2.8281	-3.4375	0.6992	0.6172	-632.0	-596.0	-1.6719	-3.1719
0.567	0.9419	900	0.5633	-3.1406	-3.9062	0.7227	0.7539	-676.0	-628.0	-1.0781	-2.4219
0.223	1.0466	1000	0.5835	-4.1562	-5.25	0.7461	1.0859	-812.0	-732.0	3.375	1.7734
0.1774	1.1512	1100	0.6000	-4.8438	-5.9688	0.7227	1.1328	-884.0	-800.0	2.8906	0.9844
0.1868	1.2559	1200	0.5954	-4.9062	-6.0625	0.7188	1.1484	-892.0	-804.0	3.5	1.9609
0.1871	1.3605	1300	0.6086	-5.3438	-6.5	0.7324	1.1562	-932.0	-848.0	3.1719	1.3281
0.1651	1.4652	1400	0.5995	-5.375	-6.4688	0.7090	1.0938	-932.0	-852.0	2.9375	1.0625
0.1557	1.5699	1500	0.6073	-5.3125	-6.5938	0.7012	1.2656	-944.0	-848.0	1.9219	-0.1582
0.2145	1.6745	1600	0.6256	-5.1875	-6.4688	0.7031	1.2656	-932.0	-832.0	3.0469	0.9570
0.1666	1.7792	1700	0.6223	-5.5312	-6.8438	0.7246	1.3047	-972.0	-868.0	3.8906	1.7969
0.164	1.8838	1800	0.6084	-4.6875	-5.9375	0.7383	1.2266	-880.0	-784.0	2.6562	0.5117
0.1552	1.9885	1900	0.6211	-5.4375	-6.7812	0.7363	1.3359	-964.0	-856.0	2.5469	0.4004
0.0204	2.0931	2000	0.6830	-6.4062	-8.0	0.7383	1.6328	-1088.0	-952.0	4.1562	2.1719
0.0205	2.1978	2100	0.8096	-9.0	-11.125	0.7168	2.1094	-1400.0	-1216.0	5.4375	3.5469
0.0228	2.3025	2200	0.8077	-8.625	-10.8125	0.7305	2.1562	-1368.0	-1176.0	5.25	3.3281
0.0148	2.4071	2300	0.7832	-8.1875	-10.1875	0.7227	2.0469	-1304.0	-1128.0	5.25	3.3906
0.0202	2.5118	2400	0.7835	-8.1875	-10.25	0.7344	2.0781	-1312.0	-1136.0	5.3125	3.375
0.01	2.6164	2500	0.7940	-8.1875	-10.3125	0.7363	2.1094	-1320.0	-1136.0	5.4688	3.5312
0.0153	2.7211	2600	0.8036	-8.5625	-10.75	0.7324	2.1719	-1360.0	-1168.0	5.625	3.75
0.0205	2.8257	2700	0.7961	-8.375	-10.5	0.7344	2.1562	-1336.0	-1152.0	5.5312	3.6406
0.0184	2.9304	2800	0.7947	-8.3125	-10.5	0.7324	2.1562	-1336.0	-1144.0	5.5	3.5938

Framework versions

Transformers 4.44.2
Pytorch 2.3.0
Datasets 2.21.0
Tokenizers 0.19.1

CharlesLi
/

OpenELM-1_1B-DPO-full-2

OpenELM-1_1B-DPO-full-2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train CharlesLi/OpenELM-1_1B-DPO-full-2

Evaluation results