--- license: apache-2.0 base_model: martimfasantos/tinyllama-1.1b-sum-sft-full_old tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - openai/summarize_from_feedback model-index: - name: tinyllama-1.1b-sum-dpo-full_LR5e-8_BS32_3epochs_old results: [] --- # tinyllama-1.1b-sum-dpo-full_LR5e-8_BS32_3epochs_old This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full_old](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full_old) on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set: - Loss: 0.6785 - Rewards/chosen: -0.1508 - Rewards/rejected: -0.1845 - Rewards/accuracies: 0.6085 - Rewards/margins: 0.0338 - Logps/rejected: -81.6350 - Logps/chosen: -73.7914 - Logits/rejected: -2.9190 - Logits/chosen: -2.9249 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-08 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6931 | 0.0345 | 100 | 0.6932 | -0.0000 | 0.0001 | 0.4828 | -0.0001 | -63.1721 | -58.7140 | -3.1575 | -3.1632 | | 0.6932 | 0.0689 | 200 | 0.6932 | 0.0000 | 0.0001 | 0.4693 | -0.0001 | -63.1709 | -58.7113 | -3.1577 | -3.1633 | | 0.693 | 0.1034 | 300 | 0.6932 | 0.0000 | 0.0001 | 0.4761 | -0.0001 | -63.1730 | -58.7112 | -3.1574 | -3.1630 | | 0.693 | 0.1378 | 400 | 0.6932 | 0.0001 | 0.0002 | 0.4842 | -0.0001 | -63.1583 | -58.6973 | -3.1575 | -3.1631 | | 0.6931 | 0.1723 | 500 | 0.6931 | 0.0002 | 0.0002 | 0.4933 | 0.0000 | -63.1594 | -58.6877 | -3.1575 | -3.1631 | | 0.6929 | 0.2068 | 600 | 0.6931 | 0.0004 | 0.0003 | 0.4988 | 0.0001 | -63.1463 | -58.6680 | -3.1569 | -3.1625 | | 0.6926 | 0.2412 | 700 | 0.6931 | 0.0005 | 0.0004 | 0.5274 | 0.0002 | -63.1449 | -58.6601 | -3.1561 | -3.1617 | | 0.6926 | 0.2757 | 800 | 0.6930 | 0.0008 | 0.0005 | 0.5286 | 0.0003 | -63.1311 | -58.6330 | -3.1552 | -3.1608 | | 0.692 | 0.3101 | 900 | 0.6929 | 0.0010 | 0.0005 | 0.5437 | 0.0005 | -63.1284 | -58.6099 | -3.1536 | -3.1592 | | 0.6915 | 0.3446 | 1000 | 0.6928 | 0.0015 | 0.0007 | 0.5497 | 0.0008 | -63.1097 | -58.5609 | -3.1515 | -3.1572 | | 0.6914 | 0.3790 | 1100 | 0.6926 | 0.0018 | 0.0008 | 0.5602 | 0.0011 | -63.1051 | -58.5277 | -3.1497 | -3.1554 | | 0.6905 | 0.4135 | 1200 | 0.6924 | 0.0018 | 0.0003 | 0.5702 | 0.0016 | -63.1514 | -58.5270 | -3.1471 | -3.1528 | | 0.6889 | 0.4480 | 1300 | 0.6922 | 0.0020 | -0.0001 | 0.5720 | 0.0020 | -63.1881 | -58.5158 | -3.1441 | -3.1497 | | 0.6896 | 0.4824 | 1400 | 0.6920 | 0.0017 | -0.0008 | 0.5685 | 0.0024 | -63.2555 | -58.5464 | -3.1410 | -3.1466 | | 0.6894 | 0.5169 | 1500 | 0.6918 | 0.0012 | -0.0016 | 0.5723 | 0.0028 | -63.3410 | -58.5945 | -3.1375 | -3.1432 | | 0.6893 | 0.5513 | 1600 | 0.6915 | 0.0008 | -0.0025 | 0.5741 | 0.0033 | -63.4302 | -58.6284 | -3.1343 | -3.1400 | | 0.6871 | 0.5858 | 1700 | 0.6913 | -0.0003 | -0.0041 | 0.5725 | 0.0038 | -63.5920 | -58.7397 | -3.1296 | -3.1353 | | 0.6879 | 0.6203 | 1800 | 0.6910 | -0.0016 | -0.0061 | 0.5764 | 0.0045 | -63.7921 | -58.8730 | -3.1255 | -3.1312 | | 0.6869 | 0.6547 | 1900 | 0.6908 | -0.0033 | -0.0083 | 0.5804 | 0.0050 | -64.0115 | -59.0426 | -3.1210 | -3.1266 | | 0.6863 | 0.6892 | 2000 | 0.6905 | -0.0059 | -0.0116 | 0.5799 | 0.0057 | -64.3388 | -59.3014 | -3.1155 | -3.1212 | | 0.685 | 0.7236 | 2100 | 0.6901 | -0.0086 | -0.0150 | 0.5915 | 0.0064 | -64.6834 | -59.5751 | -3.1097 | -3.1154 | | 0.6865 | 0.7581 | 2200 | 0.6899 | -0.0116 | -0.0186 | 0.5829 | 0.0070 | -65.0448 | -59.8767 | -3.1053 | -3.1110 | | 0.6841 | 0.7926 | 2300 | 0.6896 | -0.0155 | -0.0232 | 0.5867 | 0.0077 | -65.5006 | -60.2607 | -3.1009 | -3.1066 | | 0.6847 | 0.8270 | 2400 | 0.6892 | -0.0205 | -0.0291 | 0.5829 | 0.0085 | -66.0859 | -60.7633 | -3.0966 | -3.1023 | | 0.6838 | 0.8615 | 2500 | 0.6888 | -0.0258 | -0.0352 | 0.5969 | 0.0095 | -66.7026 | -61.2875 | -3.0907 | -3.0964 | | 0.6839 | 0.8959 | 2600 | 0.6884 | -0.0304 | -0.0408 | 0.5925 | 0.0103 | -67.2565 | -61.7539 | -3.0868 | -3.0925 | | 0.6822 | 0.9304 | 2700 | 0.6880 | -0.0353 | -0.0466 | 0.5932 | 0.0113 | -67.8404 | -62.2428 | -3.0819 | -3.0877 | | 0.6821 | 0.9649 | 2800 | 0.6877 | -0.0370 | -0.0490 | 0.5962 | 0.0119 | -68.0766 | -62.4140 | -3.0775 | -3.0832 | | 0.6805 | 0.9993 | 2900 | 0.6874 | -0.0412 | -0.0537 | 0.5897 | 0.0126 | -68.5544 | -62.8283 | -3.0727 | -3.0784 | | 0.6809 | 1.0338 | 3000 | 0.6872 | -0.0422 | -0.0553 | 0.5946 | 0.0132 | -68.7141 | -62.9285 | -3.0668 | -3.0725 | | 0.6785 | 1.0682 | 3100 | 0.6869 | -0.0451 | -0.0589 | 0.5969 | 0.0139 | -69.0748 | -63.2200 | -3.0610 | -3.0668 | | 0.6763 | 1.1027 | 3200 | 0.6866 | -0.0484 | -0.0628 | 0.5925 | 0.0144 | -69.4644 | -63.5534 | -3.0568 | -3.0626 | | 0.681 | 1.1371 | 3300 | 0.6862 | -0.0526 | -0.0679 | 0.5922 | 0.0154 | -69.9711 | -63.9670 | -3.0518 | -3.0576 | | 0.6767 | 1.1716 | 3400 | 0.6859 | -0.0571 | -0.0732 | 0.5939 | 0.0161 | -70.5048 | -64.4254 | -3.0464 | -3.0522 | | 0.6781 | 1.2061 | 3500 | 0.6856 | -0.0613 | -0.0780 | 0.5964 | 0.0168 | -70.9828 | -64.8380 | -3.0413 | -3.0471 | | 0.6774 | 1.2405 | 3600 | 0.6854 | -0.0643 | -0.0817 | 0.5983 | 0.0174 | -71.3500 | -65.1396 | -3.0358 | -3.0417 | | 0.676 | 1.2750 | 3700 | 0.6851 | -0.0670 | -0.0851 | 0.5990 | 0.0181 | -71.6879 | -65.4141 | -3.0314 | -3.0372 | | 0.675 | 1.3094 | 3800 | 0.6849 | -0.0691 | -0.0876 | 0.5969 | 0.0184 | -71.9376 | -65.6260 | -3.0263 | -3.0321 | | 0.6748 | 1.3439 | 3900 | 0.6845 | -0.0733 | -0.0928 | 0.6036 | 0.0195 | -72.4597 | -66.0422 | -3.0216 | -3.0274 | | 0.6769 | 1.3784 | 4000 | 0.6842 | -0.0778 | -0.0979 | 0.6050 | 0.0201 | -72.9665 | -66.4884 | -3.0174 | -3.0232 | | 0.6739 | 1.4128 | 4100 | 0.6839 | -0.0823 | -0.1031 | 0.6057 | 0.0208 | -73.4893 | -66.9392 | -3.0129 | -3.0187 | | 0.6668 | 1.4473 | 4200 | 0.6836 | -0.0863 | -0.1079 | 0.6034 | 0.0216 | -73.9684 | -67.3375 | -3.0092 | -3.0150 | | 0.6729 | 1.4817 | 4300 | 0.6834 | -0.0878 | -0.1098 | 0.6039 | 0.0220 | -74.1602 | -67.4919 | -3.0039 | -3.0097 | | 0.6748 | 1.5162 | 4400 | 0.6833 | -0.0890 | -0.1113 | 0.6046 | 0.0223 | -74.3079 | -67.6111 | -3.0007 | -3.0065 | | 0.6678 | 1.5507 | 4500 | 0.6828 | -0.0942 | -0.1176 | 0.6020 | 0.0234 | -74.9388 | -68.1347 | -2.9958 | -3.0016 | | 0.6735 | 1.5851 | 4600 | 0.6827 | -0.0978 | -0.1215 | 0.6015 | 0.0238 | -75.3329 | -68.4876 | -2.9917 | -2.9975 | | 0.6742 | 1.6196 | 4700 | 0.6825 | -0.0986 | -0.1228 | 0.6050 | 0.0242 | -75.4630 | -68.5761 | -2.9866 | -2.9924 | | 0.6741 | 1.6540 | 4800 | 0.6823 | -0.1018 | -0.1265 | 0.6018 | 0.0247 | -75.8309 | -68.8950 | -2.9819 | -2.9877 | | 0.6637 | 1.6885 | 4900 | 0.6819 | -0.1054 | -0.1308 | 0.6039 | 0.0255 | -76.2624 | -69.2486 | -2.9782 | -2.9839 | | 0.6702 | 1.7229 | 5000 | 0.6818 | -0.1074 | -0.1332 | 0.6046 | 0.0258 | -76.5000 | -69.4502 | -2.9748 | -2.9806 | | 0.6694 | 1.7574 | 5100 | 0.6815 | -0.1107 | -0.1371 | 0.6032 | 0.0264 | -76.8899 | -69.7811 | -2.9703 | -2.9761 | | 0.6654 | 1.7919 | 5200 | 0.6813 | -0.1132 | -0.1401 | 0.6048 | 0.0269 | -77.1926 | -70.0320 | -2.9661 | -2.9719 | | 0.6698 | 1.8263 | 5300 | 0.6811 | -0.1166 | -0.1441 | 0.6066 | 0.0275 | -77.5853 | -70.3683 | -2.9626 | -2.9684 | | 0.6644 | 1.8608 | 5400 | 0.6808 | -0.1197 | -0.1478 | 0.6036 | 0.0281 | -77.9603 | -70.6842 | -2.9592 | -2.9650 | | 0.6735 | 1.8952 | 5500 | 0.6807 | -0.1219 | -0.1503 | 0.6018 | 0.0285 | -78.2133 | -70.8988 | -2.9561 | -2.9619 | | 0.662 | 1.9297 | 5600 | 0.6805 | -0.1258 | -0.1548 | 0.6032 | 0.0290 | -78.6641 | -71.2920 | -2.9526 | -2.9585 | | 0.6634 | 1.9642 | 5700 | 0.6803 | -0.1274 | -0.1568 | 0.6050 | 0.0294 | -78.8583 | -71.4504 | -2.9495 | -2.9554 | | 0.6685 | 1.9986 | 5800 | 0.6802 | -0.1293 | -0.1591 | 0.6032 | 0.0298 | -79.0912 | -71.6448 | -2.9473 | -2.9532 | | 0.6698 | 2.0331 | 5900 | 0.6800 | -0.1323 | -0.1626 | 0.6039 | 0.0303 | -79.4426 | -71.9459 | -2.9444 | -2.9503 | | 0.6627 | 2.0675 | 6000 | 0.6798 | -0.1342 | -0.1649 | 0.6064 | 0.0307 | -79.6712 | -72.1328 | -2.9419 | -2.9477 | | 0.6631 | 2.1020 | 6100 | 0.6796 | -0.1352 | -0.1662 | 0.6069 | 0.0310 | -79.7986 | -72.2308 | -2.9397 | -2.9456 | | 0.6629 | 2.1365 | 6200 | 0.6796 | -0.1373 | -0.1685 | 0.6085 | 0.0312 | -80.0281 | -72.4374 | -2.9374 | -2.9433 | | 0.6672 | 2.1709 | 6300 | 0.6794 | -0.1393 | -0.1709 | 0.6076 | 0.0316 | -80.2661 | -72.6388 | -2.9347 | -2.9405 | | 0.6687 | 2.2054 | 6400 | 0.6794 | -0.1401 | -0.1719 | 0.6085 | 0.0317 | -80.3653 | -72.7241 | -2.9322 | -2.9380 | | 0.6662 | 2.2398 | 6500 | 0.6793 | -0.1415 | -0.1735 | 0.6087 | 0.0320 | -80.5257 | -72.8570 | -2.9306 | -2.9364 | | 0.6701 | 2.2743 | 6600 | 0.6792 | -0.1423 | -0.1744 | 0.6097 | 0.0321 | -80.6223 | -72.9458 | -2.9287 | -2.9345 | | 0.6592 | 2.3088 | 6700 | 0.6791 | -0.1429 | -0.1753 | 0.6076 | 0.0323 | -80.7084 | -73.0069 | -2.9274 | -2.9333 | | 0.668 | 2.3432 | 6800 | 0.6790 | -0.1440 | -0.1765 | 0.6080 | 0.0325 | -80.8346 | -73.1154 | -2.9267 | -2.9326 | | 0.6637 | 2.3777 | 6900 | 0.6790 | -0.1452 | -0.1778 | 0.6064 | 0.0327 | -80.9639 | -73.2289 | -2.9251 | -2.9310 | | 0.6645 | 2.4121 | 7000 | 0.6789 | -0.1459 | -0.1788 | 0.6090 | 0.0329 | -81.0581 | -73.3020 | -2.9243 | -2.9301 | | 0.6589 | 2.4466 | 7100 | 0.6788 | -0.1464 | -0.1795 | 0.6099 | 0.0331 | -81.1271 | -73.3526 | -2.9234 | -2.9293 | | 0.6636 | 2.4810 | 7200 | 0.6787 | -0.1477 | -0.1809 | 0.6087 | 0.0333 | -81.2743 | -73.4802 | -2.9223 | -2.9282 | | 0.6679 | 2.5155 | 7300 | 0.6787 | -0.1484 | -0.1817 | 0.6101 | 0.0332 | -81.3471 | -73.5563 | -2.9220 | -2.9279 | | 0.6679 | 2.5500 | 7400 | 0.6787 | -0.1491 | -0.1825 | 0.6094 | 0.0334 | -81.4263 | -73.6218 | -2.9215 | -2.9273 | | 0.6657 | 2.5844 | 7500 | 0.6786 | -0.1496 | -0.1831 | 0.6080 | 0.0335 | -81.4883 | -73.6727 | -2.9211 | -2.9270 | | 0.6638 | 2.6189 | 7600 | 0.6787 | -0.1501 | -0.1835 | 0.6078 | 0.0334 | -81.5289 | -73.7227 | -2.9205 | -2.9263 | | 0.6638 | 2.6533 | 7700 | 0.6787 | -0.1500 | -0.1834 | 0.6106 | 0.0334 | -81.5211 | -73.7089 | -2.9202 | -2.9261 | | 0.6664 | 2.6878 | 7800 | 0.6786 | -0.1503 | -0.1839 | 0.6090 | 0.0336 | -81.5662 | -73.7409 | -2.9198 | -2.9256 | | 0.6631 | 2.7223 | 7900 | 0.6785 | -0.1503 | -0.1840 | 0.6080 | 0.0337 | -81.5786 | -73.7370 | -2.9195 | -2.9254 | | 0.666 | 2.7567 | 8000 | 0.6786 | -0.1506 | -0.1843 | 0.6069 | 0.0337 | -81.6062 | -73.7714 | -2.9191 | -2.9250 | | 0.6577 | 2.7912 | 8100 | 0.6786 | -0.1507 | -0.1843 | 0.6076 | 0.0336 | -81.6118 | -73.7826 | -2.9193 | -2.9252 | | 0.6608 | 2.8256 | 8200 | 0.6786 | -0.1507 | -0.1844 | 0.6073 | 0.0337 | -81.6240 | -73.7849 | -2.9191 | -2.9250 | | 0.6736 | 2.8601 | 8300 | 0.6785 | -0.1505 | -0.1844 | 0.6080 | 0.0338 | -81.6154 | -73.7657 | -2.9191 | -2.9250 | | 0.6687 | 2.8946 | 8400 | 0.6785 | -0.1507 | -0.1844 | 0.6094 | 0.0337 | -81.6251 | -73.7842 | -2.9192 | -2.9251 | | 0.6637 | 2.9290 | 8500 | 0.6785 | -0.1505 | -0.1843 | 0.6090 | 0.0338 | -81.6091 | -73.7641 | -2.9192 | -2.9251 | | 0.6689 | 2.9635 | 8600 | 0.6786 | -0.1508 | -0.1844 | 0.6078 | 0.0336 | -81.6197 | -73.7927 | -2.9189 | -2.9248 | | 0.6585 | 2.9979 | 8700 | 0.6785 | -0.1508 | -0.1845 | 0.6085 | 0.0338 | -81.6350 | -73.7914 | -2.9190 | -2.9249 | ### Framework versions - Transformers 4.41.2 - Pytorch 2.1.2 - Datasets 2.19.2 - Tokenizers 0.19.1