tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

Loss: 0.6411
Rewards/chosen: -1.5955
Rewards/rejected: -1.9066
Rewards/accuracies: 0.6273
Rewards/margins: 0.3112
Logps/rejected: -253.4108
Logps/chosen: -218.5612
Logits/rejected: -2.1502
Logits/chosen: -2.1697

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6924	0.0689	400	0.6930	0.0011	0.0007	0.5390	0.0003	-62.6755	-58.9094	-2.9687	-2.9723
0.6891	0.1378	800	0.6909	-0.0061	-0.0108	0.5748	0.0047	-63.8305	-59.6239	-2.9588	-2.9622
0.6874	0.2068	1200	0.6876	-0.0302	-0.0427	0.5871	0.0124	-67.0173	-62.0385	-2.9361	-2.9395
0.676	0.2757	1600	0.6820	-0.1057	-0.1316	0.5850	0.0259	-75.9065	-69.5813	-2.8942	-2.8976
0.6751	0.3446	2000	0.6770	-0.1715	-0.2098	0.5890	0.0384	-83.7308	-76.1611	-2.8434	-2.8468
0.6518	0.4135	2400	0.6676	-0.3727	-0.4381	0.6069	0.0654	-106.5637	-96.2904	-2.7893	-2.7926
0.6695	0.4824	2800	0.6631	-0.4734	-0.5560	0.6141	0.0826	-118.3500	-106.3523	-2.7415	-2.7450
0.6467	0.5513	3200	0.6583	-0.6700	-0.7814	0.625	0.1113	-140.8851	-126.0199	-2.6864	-2.6902
0.6264	0.6203	3600	0.6586	-0.6359	-0.7384	0.6106	0.1024	-136.5857	-122.6100	-2.6176	-2.6225
0.6203	0.6892	4000	0.6523	-0.7851	-0.9183	0.6166	0.1332	-154.5775	-137.5248	-2.5583	-2.5642
0.6341	0.7581	4400	0.6487	-0.8786	-1.0259	0.6129	0.1473	-165.3377	-146.8752	-2.4643	-2.4723
0.6184	0.8270	4800	0.6454	-1.0766	-1.2481	0.6129	0.1716	-187.5630	-166.6730	-2.4141	-2.4242
0.609	0.8959	5200	0.6414	-0.9919	-1.1678	0.6164	0.1759	-179.5278	-158.2066	-2.3970	-2.4080
0.5977	0.9649	5600	0.6432	-0.9166	-1.0804	0.6273	0.1638	-170.7888	-150.6710	-2.3933	-2.4042
0.5845	1.0338	6000	0.6438	-1.3686	-1.6032	0.6245	0.2346	-223.0724	-195.8758	-2.2640	-2.2816
0.5789	1.1027	6400	0.6455	-1.3882	-1.6212	0.6164	0.2331	-224.8725	-197.8306	-2.2428	-2.2595
0.5681	1.1716	6800	0.6434	-1.3348	-1.5500	0.6129	0.2153	-217.7540	-192.4917	-2.2435	-2.2593
0.5602	1.2405	7200	0.6448	-1.3673	-1.5959	0.6234	0.2286	-222.3391	-195.7428	-2.2210	-2.2378
0.6357	1.3094	7600	0.6413	-1.3975	-1.6344	0.6125	0.2368	-226.1876	-198.7702	-2.2034	-2.2208
0.5491	1.3784	8000	0.6438	-1.4655	-1.7121	0.6055	0.2466	-233.9599	-205.5657	-2.1906	-2.2085
0.5537	1.4473	8400	0.6445	-1.4375	-1.6793	0.6259	0.2418	-230.6812	-202.7634	-2.1797	-2.1984
0.61	1.5162	8800	0.6405	-1.0941	-1.2946	0.6164	0.2005	-192.2120	-168.4266	-2.2428	-2.2579
0.523	1.5851	9200	0.6431	-1.4596	-1.7029	0.6289	0.2433	-233.0398	-204.9723	-2.1570	-2.1756
0.5412	1.6540	9600	0.6393	-1.4228	-1.6896	0.6315	0.2668	-231.7097	-201.2986	-2.1513	-2.1708
0.5368	1.7229	10000	0.6408	-1.3358	-1.5858	0.6236	0.2500	-221.3330	-192.5947	-2.1730	-2.1915
0.5064	1.7919	10400	0.6423	-1.0625	-1.2620	0.6215	0.1995	-188.9488	-165.2631	-2.2150	-2.2307
0.5268	1.8608	10800	0.6406	-1.4254	-1.6829	0.6341	0.2575	-231.0404	-201.5558	-2.1644	-2.1831
0.5384	1.9297	11200	0.6418	-1.6486	-1.9439	0.6364	0.2954	-257.1440	-223.8720	-2.1299	-2.1503
0.5734	1.9986	11600	0.6378	-1.4356	-1.7101	0.6362	0.2744	-233.7563	-202.5782	-2.1624	-2.1813
0.5302	2.0675	12000	0.6413	-1.7064	-2.0285	0.6292	0.3221	-265.5970	-229.6515	-2.1257	-2.1466
0.4961	2.1365	12400	0.6474	-2.0075	-2.3712	0.6387	0.3637	-299.8690	-259.7696	-2.0958	-2.1178
0.55	2.2054	12800	0.6415	-1.5035	-1.7868	0.6315	0.2833	-241.4328	-209.3660	-2.1574	-2.1761
0.5546	2.2743	13200	0.6425	-1.6715	-1.9874	0.6303	0.3159	-261.4859	-226.1615	-2.1413	-2.1612
0.5639	2.3432	13600	0.6409	-1.5908	-1.8980	0.6289	0.3072	-252.5519	-218.1001	-2.1481	-2.1675
0.5055	2.4121	14000	0.6384	-1.4618	-1.7629	0.6257	0.3010	-239.0347	-205.1979	-2.1665	-2.1857
0.5404	2.4810	14400	0.6405	-1.6514	-1.9790	0.6285	0.3276	-260.6489	-224.1589	-2.1411	-2.1613
0.5348	2.5500	14800	0.6418	-1.6812	-2.0090	0.6276	0.3278	-263.6481	-227.1385	-2.1375	-2.1578
0.5114	2.6189	15200	0.6408	-1.5587	-1.8632	0.6310	0.3046	-249.0734	-214.8810	-2.1538	-2.1732
0.5356	2.6878	15600	0.6405	-1.5493	-1.8534	0.6266	0.3041	-248.0918	-213.9473	-2.1550	-2.1743
0.4885	2.7567	16000	0.6406	-1.5822	-1.8916	0.6269	0.3094	-251.9056	-217.2328	-2.1512	-2.1707
0.5057	2.8256	16400	0.6410	-1.5799	-1.8883	0.6306	0.3084	-251.5751	-217.0051	-2.1527	-2.1720
0.5731	2.8946	16800	0.6412	-1.5917	-1.9021	0.6271	0.3104	-252.9564	-218.1854	-2.1507	-2.1702
0.4958	2.9635	17200	0.6412	-1.5933	-1.9040	0.6296	0.3107	-253.1478	-218.3473	-2.1506	-2.1702

Framework versions

Transformers 4.41.2
Pytorch 2.1.2
Datasets 2.19.2
Tokenizers 0.19.1

martimfasantos
/

tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs

tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for martimfasantos/tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs

Evaluation results