OpenELM-1_1B-DPO-full-3-5

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1637
Rewards/chosen: -13.625
Rewards/rejected: -17.0
Rewards/accuracies: 0.7051
Rewards/margins: 3.375
Logps/rejected: -1984.0
Logps/chosen: -1680.0
Logits/rejected: 3.8594
Logits/chosen: 1.9453

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6232	0.1047	100	0.6285	-0.6055	-0.8242	0.6660	0.2178	-368.0	-374.0	-8.25	-8.5
0.5729	0.2093	200	0.5957	-1.6328	-2.1094	0.6992	0.4766	-498.0	-478.0	-7.9688	-8.4375
0.6122	0.3140	300	0.5751	-1.6016	-2.1094	0.7129	0.5	-496.0	-474.0	-5.5938	-6.1562
0.5905	0.4186	400	0.5994	-1.6328	-2.1875	0.6680	0.5508	-504.0	-478.0	-5.5625	-6.3438
0.5781	0.5233	500	0.5764	-1.7188	-2.2656	0.6816	0.5586	-512.0	-486.0	-6.0625	-6.8438
0.5356	0.6279	600	0.5831	-3.8906	-4.5625	0.6699	0.6797	-744.0	-704.0	-3.3281	-4.25
0.5756	0.7326	700	0.5859	-3.4219	-4.0312	0.7012	0.6133	-692.0	-656.0	-8.8125	-9.375
0.5528	0.8373	800	0.5732	-2.8906	-3.5	0.6836	0.6016	-636.0	-604.0	-7.4375	-8.3125
0.5753	0.9419	900	0.5693	-3.0469	-3.7344	0.7168	0.6797	-660.0	-620.0	-7.0	-7.9062
0.2632	1.0466	1000	0.5881	-4.1875	-5.2188	0.7148	1.0312	-808.0	-732.0	-2.875	-4.25
0.2283	1.1512	1100	0.6142	-4.5312	-5.5312	0.7129	0.9961	-840.0	-768.0	-5.375	-7.0625
0.2202	1.2559	1200	0.5943	-4.0938	-5.1875	0.7090	1.0781	-804.0	-724.0	-1.875	-3.375
0.2472	1.3605	1300	0.5995	-4.4375	-5.4062	0.7168	0.9844	-828.0	-760.0	-2.2188	-3.6875
0.2406	1.4652	1400	0.5971	-5.2188	-6.2188	0.7188	1.0156	-908.0	-836.0	-3.875	-5.2812
0.2059	1.5699	1500	0.6052	-5.3438	-6.5312	0.7148	1.1953	-940.0	-848.0	-4.2188	-5.7812
0.2305	1.6745	1600	0.6068	-4.875	-5.9062	0.7188	1.0391	-876.0	-800.0	-5.1562	-6.6875
0.2327	1.7792	1700	0.6141	-5.9375	-7.1562	0.7168	1.2188	-1000.0	-908.0	-4.5	-6.0625
0.2221	1.8838	1800	0.6072	-6.4688	-7.6562	0.7266	1.1875	-1048.0	-960.0	-1.9844	-3.625
0.2153	1.9885	1900	0.5949	-6.5	-7.6875	0.7266	1.1953	-1056.0	-964.0	-3.3125	-4.875
0.0215	2.0931	2000	0.7470	-8.6875	-10.5	0.7246	1.8125	-1336.0	-1184.0	-0.1074	-1.9609
0.0303	2.1978	2100	0.7469	-8.3125	-10.25	0.7031	1.9453	-1312.0	-1144.0	-0.1299	-2.0781
0.0322	2.3025	2200	0.7584	-8.5625	-10.4375	0.7109	1.8828	-1328.0	-1168.0	-0.5156	-2.6094
0.0253	2.4071	2300	0.8087	-9.8125	-11.9375	0.7129	2.125	-1480.0	-1296.0	1.2656	-0.7539
0.0302	2.5118	2400	0.8033	-9.0	-11.0625	0.7246	2.0312	-1392.0	-1216.0	2.2812	0.4395
0.0218	2.6164	2500	0.8603	-11.0	-13.3125	0.7188	2.3125	-1616.0	-1408.0	2.2969	0.5195
0.027	2.7211	2600	0.8162	-9.75	-12.0	0.7402	2.2188	-1488.0	-1288.0	1.0703	-0.9609
0.0274	2.8257	2700	0.8296	-9.75	-12.0	0.7188	2.2188	-1480.0	-1288.0	1.125	-0.9102
0.0369	2.9304	2800	0.8085	-9.5625	-11.875	0.7227	2.3125	-1472.0	-1272.0	0.6289	-1.4531
0.0154	3.0351	2900	0.8779	-9.875	-12.375	0.7266	2.5	-1520.0	-1296.0	0.9609	-1.3125
0.007	3.1397	3000	0.9780	-11.5	-14.375	0.7207	2.875	-1728.0	-1464.0	2.7969	0.6836
0.0059	3.2444	3100	0.9793	-11.125	-14.0	0.7090	2.875	-1688.0	-1424.0	2.2188	0.0258
0.0102	3.3490	3200	0.9823	-11.0625	-13.875	0.7148	2.8281	-1672.0	-1424.0	2.7656	0.7539
0.0082	3.4537	3300	1.0423	-12.1875	-15.1875	0.7051	3.0	-1800.0	-1528.0	3.3281	1.4453
0.0109	3.5583	3400	1.0225	-11.375	-14.375	0.7168	2.9688	-1720.0	-1456.0	2.875	0.8672
0.0098	3.6630	3500	1.0070	-11.4375	-14.25	0.7109	2.8438	-1712.0	-1456.0	3.1875	1.1719
0.007	3.7677	3600	1.0390	-11.9375	-14.9375	0.7148	3.0	-1776.0	-1512.0	2.8594	0.8086
0.0057	3.8723	3700	1.0702	-12.75	-15.8125	0.7031	3.0625	-1864.0	-1584.0	3.4531	1.5
0.0054	3.9770	3800	1.0485	-12.4375	-15.4375	0.7031	3.0	-1832.0	-1560.0	3.4062	1.4688
0.0037	4.0816	3900	1.0905	-12.8125	-15.9375	0.7031	3.1406	-1880.0	-1600.0	3.5469	1.6172
0.0031	4.1863	4000	1.1163	-13.0625	-16.25	0.7012	3.2188	-1912.0	-1616.0	3.6094	1.6562
0.0037	4.2909	4100	1.1256	-13.125	-16.375	0.7090	3.2656	-1920.0	-1624.0	3.6094	1.6562
0.0089	4.3956	4200	1.1395	-13.3125	-16.625	0.7070	3.3125	-1952.0	-1648.0	3.75	1.8125
0.0042	4.5003	4300	1.1512	-13.4375	-16.75	0.7051	3.3438	-1968.0	-1664.0	3.7969	1.8672
0.0094	4.6049	4400	1.1580	-13.5	-16.875	0.7070	3.3594	-1976.0	-1664.0	3.8125	1.8828
0.006	4.7096	4500	1.1593	-13.5625	-17.0	0.7051	3.375	-1984.0	-1672.0	3.8438	1.9219
0.0029	4.8142	4600	1.1617	-13.625	-17.0	0.7051	3.375	-1984.0	-1680.0	3.8594	1.9375
0.0059	4.9189	4700	1.1637	-13.625	-17.0	0.7051	3.375	-1984.0	-1680.0	3.8594	1.9453

Framework versions

Transformers 4.44.2
Pytorch 2.1.2
Datasets 2.18.0
Tokenizers 0.19.1

CharlesLi
/

OpenELM-1_1B-DPO-full-3-5

OpenELM-1_1B-DPO-full-3-5

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results