OpenELM-1_1B-DPO-full-llama-improve-openelm

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1360
Rewards/chosen: -5.6875
Rewards/rejected: -6.1562
Rewards/accuracies: 0.5469
Rewards/margins: 0.4668
Logps/rejected: -904.0
Logps/chosen: -888.0
Logits/rejected: -9.5625
Logits/chosen: -10.1875

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.0021	0.1047	100	0.9013	-3.25	-3.3906	0.4883	0.1416	-628.0	-644.0	-9.6875	-10.25
0.0015	0.2093	200	0.7819	-1.1094	-1.1953	0.5078	0.0850	-408.0	-430.0	-7.25	-8.0
0.0056	0.3140	300	0.8233	-3.9844	-4.3125	0.5391	0.3398	-720.0	-716.0	-3.7344	-4.6875
0.001	0.4186	400	1.2958	-5.375	-5.7812	0.5156	0.4141	-868.0	-856.0	-7.5	-8.125
0.0089	0.5233	500	1.5850	-8.4375	-8.875	0.5273	0.4688	-1176.0	-1160.0	-7.6562	-8.375
0.0037	0.6279	600	0.9525	-4.0312	-4.3438	0.5215	0.3027	-720.0	-720.0	-12.0	-12.3125
0.003	0.7326	700	2.1298	-9.4375	-10.5	0.5371	1.1016	-1344.0	-1256.0	-3.6875	-4.875
0.0001	0.8373	800	2.1121	-9.4375	-10.4375	0.5312	1.0547	-1336.0	-1264.0	-8.6875	-9.5
0.0037	0.9419	900	1.3021	-7.0	-7.0938	0.5156	0.0923	-996.0	-1016.0	-9.625	-9.9375
0.0003	1.0466	1000	1.0153	-5.7188	-5.9062	0.5430	0.2090	-880.0	-888.0	-10.375	-10.75
0.0001	1.1512	1100	1.1537	-6.5312	-6.8125	0.5273	0.2734	-968.0	-972.0	-9.9375	-10.5
0.0016	1.2559	1200	1.2422	-6.9688	-7.25	0.5312	0.2773	-1012.0	-1016.0	-11.125	-11.5
0.0001	1.3605	1300	1.2745	-7.4062	-7.6875	0.5215	0.2969	-1056.0	-1056.0	-11.25	-11.5625
0.0001	1.4652	1400	0.9129	-4.1562	-4.375	0.5332	0.2168	-724.0	-732.0	-10.5	-10.8125
0.0	1.5699	1500	1.1999	-6.1562	-6.5938	0.5449	0.4473	-948.0	-932.0	-7.0625	-7.8438
0.0	1.6745	1600	1.2007	-5.75	-6.1875	0.5371	0.4434	-908.0	-892.0	-8.75	-9.5
0.0001	1.7792	1700	1.3752	-7.3438	-7.9062	0.5371	0.5664	-1080.0	-1056.0	-8.0625	-8.8125
0.0	1.8838	1800	1.2737	-6.5625	-7.125	0.5469	0.5508	-1000.0	-976.0	-9.0625	-9.75
0.0001	1.9885	1900	1.0200	-4.625	-4.9375	0.5391	0.2969	-784.0	-780.0	-10.25	-10.8125
0.0	2.0931	2000	1.0691	-5.25	-5.6562	0.5449	0.3926	-852.0	-844.0	-9.75	-10.375
0.0	2.1978	2100	1.1145	-5.625	-6.0938	0.5469	0.4531	-896.0	-884.0	-9.375	-10.0
0.0	2.3025	2200	1.1357	-5.8125	-6.3125	0.5527	0.4766	-920.0	-900.0	-9.125	-9.8125
0.0001	2.4071	2300	1.1362	-5.8125	-6.2812	0.5469	0.4766	-916.0	-900.0	-9.1875	-9.8125
0.0	2.5118	2400	1.1353	-5.7188	-6.1875	0.5430	0.4688	-908.0	-892.0	-9.4375	-10.0625
0.0	2.6164	2500	1.1318	-5.6875	-6.1562	0.5391	0.4629	-904.0	-888.0	-9.5625	-10.1875
0.0	2.7211	2600	1.1339	-5.6875	-6.1562	0.5430	0.4688	-904.0	-888.0	-9.5625	-10.1875
0.0	2.8257	2700	1.1359	-5.6875	-6.1562	0.5469	0.4668	-904.0	-888.0	-9.5625	-10.1875
0.0	2.9304	2800	1.1360	-5.6875	-6.1562	0.5469	0.4668	-904.0	-888.0	-9.5625	-10.1875

Framework versions

Transformers 4.44.2
Pytorch 2.3.0
Datasets 3.0.0
Tokenizers 0.19.1

CharlesLi
/

OpenELM-1_1B-DPO-full-llama-improve-openelm

OpenELM-1_1B-DPO-full-llama-improve-openelm

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results