tinyllama-1.1b-sum-dpo-full
This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:
- Loss: 0.6549
- Rewards/chosen: -0.4976
- Rewards/rejected: -0.6010
- Rewards/accuracies: 0.6194
- Rewards/margins: 0.1035
- Logps/rejected: -123.2810
- Logps/chosen: -108.4673
- Logits/rejected: -2.5516
- Logits/chosen: -2.5584
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6932 | 0.0172 | 100 | 0.6932 | 0.0000 | 0.0001 | 0.4819 | -0.0001 | -63.1720 | -58.7099 | -3.1572 | -3.1629 |
0.6931 | 0.0345 | 200 | 0.6932 | 0.0000 | 0.0001 | 0.4893 | -0.0001 | -63.1716 | -58.7118 | -3.1576 | -3.1632 |
0.6932 | 0.0517 | 300 | 0.6932 | 0.0000 | 0.0001 | 0.4696 | -0.0001 | -63.1677 | -58.7096 | -3.1575 | -3.1631 |
0.6933 | 0.0689 | 400 | 0.6932 | 0.0002 | 0.0002 | 0.4844 | -0.0000 | -63.1572 | -58.6929 | -3.1574 | -3.1631 |
0.6931 | 0.0861 | 500 | 0.6931 | 0.0002 | 0.0002 | 0.5016 | 0.0000 | -63.1582 | -58.6892 | -3.1571 | -3.1628 |
0.6925 | 0.1034 | 600 | 0.6931 | 0.0004 | 0.0003 | 0.5158 | 0.0002 | -63.1507 | -58.6671 | -3.1566 | -3.1623 |
0.6927 | 0.1206 | 700 | 0.6931 | 0.0006 | 0.0004 | 0.5276 | 0.0002 | -63.1420 | -58.6550 | -3.1556 | -3.1612 |
0.6924 | 0.1378 | 800 | 0.6929 | 0.0010 | 0.0006 | 0.5509 | 0.0005 | -63.1244 | -58.6089 | -3.1546 | -3.1601 |
0.692 | 0.1551 | 900 | 0.6928 | 0.0014 | 0.0007 | 0.5534 | 0.0007 | -63.1085 | -58.5690 | -3.1524 | -3.1580 |
0.6924 | 0.1723 | 1000 | 0.6926 | 0.0018 | 0.0007 | 0.5660 | 0.0011 | -63.1097 | -58.5334 | -3.1494 | -3.1550 |
0.6913 | 0.1895 | 1100 | 0.6924 | 0.0021 | 0.0005 | 0.5737 | 0.0016 | -63.1303 | -58.5028 | -3.1458 | -3.1514 |
0.6912 | 0.2068 | 1200 | 0.6921 | 0.0022 | 0.0001 | 0.5795 | 0.0021 | -63.1677 | -58.4881 | -3.1407 | -3.1464 |
0.6911 | 0.2240 | 1300 | 0.6918 | 0.0017 | -0.0011 | 0.5901 | 0.0028 | -63.2892 | -58.5372 | -3.1358 | -3.1414 |
0.6871 | 0.2412 | 1400 | 0.6914 | 0.0006 | -0.0031 | 0.5785 | 0.0037 | -63.4895 | -58.6491 | -3.1300 | -3.1356 |
0.6866 | 0.2584 | 1500 | 0.6910 | -0.0015 | -0.0061 | 0.5750 | 0.0045 | -63.7853 | -58.8661 | -3.1246 | -3.1303 |
0.6876 | 0.2757 | 1600 | 0.6907 | -0.0038 | -0.0091 | 0.5874 | 0.0053 | -64.0863 | -59.0928 | -3.1185 | -3.1241 |
0.6882 | 0.2929 | 1700 | 0.6903 | -0.0067 | -0.0126 | 0.5850 | 0.0060 | -64.4449 | -59.3800 | -3.1117 | -3.1173 |
0.6838 | 0.3101 | 1800 | 0.6900 | -0.0121 | -0.0190 | 0.5825 | 0.0069 | -65.0772 | -59.9201 | -3.1038 | -3.1095 |
0.6836 | 0.3274 | 1900 | 0.6895 | -0.0157 | -0.0235 | 0.5883 | 0.0078 | -65.5277 | -60.2801 | -3.0980 | -3.1037 |
0.685 | 0.3446 | 2000 | 0.6889 | -0.0227 | -0.0319 | 0.5897 | 0.0092 | -66.3702 | -60.9847 | -3.0905 | -3.0962 |
0.6828 | 0.3618 | 2100 | 0.6883 | -0.0311 | -0.0418 | 0.5806 | 0.0107 | -67.3595 | -61.8209 | -3.0840 | -3.0897 |
0.6745 | 0.3790 | 2200 | 0.6876 | -0.0382 | -0.0504 | 0.5883 | 0.0123 | -68.2227 | -62.5273 | -3.0753 | -3.0811 |
0.6781 | 0.3963 | 2300 | 0.6872 | -0.0405 | -0.0537 | 0.5908 | 0.0131 | -68.5468 | -62.7638 | -3.0689 | -3.0745 |
0.6809 | 0.4135 | 2400 | 0.6866 | -0.0471 | -0.0615 | 0.5906 | 0.0144 | -69.3305 | -63.4208 | -3.0592 | -3.0649 |
0.6828 | 0.4307 | 2500 | 0.6862 | -0.0557 | -0.0713 | 0.5913 | 0.0156 | -70.3087 | -64.2813 | -3.0501 | -3.0558 |
0.6754 | 0.4480 | 2600 | 0.6856 | -0.0615 | -0.0783 | 0.5918 | 0.0168 | -71.0083 | -64.8584 | -3.0433 | -3.0490 |
0.6768 | 0.4652 | 2700 | 0.6851 | -0.0674 | -0.0853 | 0.5957 | 0.0180 | -71.7136 | -65.4475 | -3.0370 | -3.0427 |
0.6766 | 0.4824 | 2800 | 0.6846 | -0.0727 | -0.0919 | 0.5967 | 0.0192 | -72.3669 | -65.9771 | -3.0308 | -3.0365 |
0.6769 | 0.4997 | 2900 | 0.6843 | -0.0755 | -0.0954 | 0.6004 | 0.0199 | -72.7197 | -66.2619 | -3.0232 | -3.0289 |
0.6781 | 0.5169 | 3000 | 0.6839 | -0.0812 | -0.1022 | 0.6027 | 0.0210 | -73.3995 | -66.8329 | -3.0144 | -3.0201 |
0.67 | 0.5341 | 3100 | 0.6835 | -0.0822 | -0.1040 | 0.6004 | 0.0218 | -73.5753 | -66.9287 | -3.0095 | -3.0153 |
0.6718 | 0.5513 | 3200 | 0.6828 | -0.0939 | -0.1173 | 0.6015 | 0.0235 | -74.9148 | -68.1005 | -2.9982 | -3.0040 |
0.6724 | 0.5686 | 3300 | 0.6822 | -0.0999 | -0.1249 | 0.6050 | 0.0250 | -75.6694 | -68.7027 | -2.9851 | -2.9908 |
0.6625 | 0.5858 | 3400 | 0.6818 | -0.1009 | -0.1266 | 0.6090 | 0.0257 | -75.8440 | -68.8060 | -2.9762 | -2.9820 |
0.6742 | 0.6030 | 3500 | 0.6814 | -0.1071 | -0.1338 | 0.6083 | 0.0267 | -76.5617 | -69.4202 | -2.9687 | -2.9745 |
0.6722 | 0.6203 | 3600 | 0.6810 | -0.1126 | -0.1404 | 0.6099 | 0.0277 | -77.2155 | -69.9734 | -2.9597 | -2.9655 |
0.664 | 0.6375 | 3700 | 0.6803 | -0.1209 | -0.1502 | 0.6090 | 0.0293 | -78.2040 | -70.8018 | -2.9485 | -2.9543 |
0.6644 | 0.6547 | 3800 | 0.6795 | -0.1327 | -0.1641 | 0.6111 | 0.0314 | -79.5918 | -71.9851 | -2.9386 | -2.9444 |
0.6664 | 0.6720 | 3900 | 0.6786 | -0.1449 | -0.1784 | 0.6080 | 0.0335 | -81.0222 | -73.2044 | -2.9300 | -2.9358 |
0.6653 | 0.6892 | 4000 | 0.6781 | -0.1559 | -0.1909 | 0.6057 | 0.0350 | -82.2692 | -74.3040 | -2.9178 | -2.9236 |
0.6532 | 0.7064 | 4100 | 0.6776 | -0.1612 | -0.1975 | 0.6125 | 0.0363 | -82.9296 | -74.8363 | -2.9005 | -2.9064 |
0.6733 | 0.7236 | 4200 | 0.6769 | -0.1720 | -0.2098 | 0.6087 | 0.0378 | -84.1639 | -75.9119 | -2.8890 | -2.8949 |
0.6618 | 0.7409 | 4300 | 0.6764 | -0.1798 | -0.2189 | 0.6057 | 0.0391 | -85.0723 | -76.6940 | -2.8794 | -2.8853 |
0.6625 | 0.7581 | 4400 | 0.6757 | -0.1936 | -0.2347 | 0.6053 | 0.0411 | -86.6464 | -78.0713 | -2.8686 | -2.8745 |
0.6605 | 0.7753 | 4500 | 0.6746 | -0.2097 | -0.2535 | 0.6066 | 0.0439 | -88.5342 | -79.6776 | -2.8590 | -2.8649 |
0.6437 | 0.7926 | 4600 | 0.6737 | -0.2242 | -0.2703 | 0.6071 | 0.0461 | -90.2150 | -81.1344 | -2.8513 | -2.8573 |
0.6526 | 0.8098 | 4700 | 0.6727 | -0.2385 | -0.2872 | 0.6069 | 0.0487 | -91.9046 | -82.5646 | -2.8429 | -2.8489 |
0.6604 | 0.8270 | 4800 | 0.6721 | -0.2495 | -0.2999 | 0.6090 | 0.0504 | -93.1696 | -83.6594 | -2.8351 | -2.8410 |
0.6664 | 0.8442 | 4900 | 0.6712 | -0.2621 | -0.3148 | 0.6048 | 0.0526 | -94.6595 | -84.9266 | -2.8264 | -2.8324 |
0.6499 | 0.8615 | 5000 | 0.6707 | -0.2706 | -0.3247 | 0.5955 | 0.0541 | -95.6483 | -85.7703 | -2.8111 | -2.8172 |
0.6628 | 0.8787 | 5100 | 0.6697 | -0.2843 | -0.3411 | 0.5969 | 0.0568 | -97.2923 | -87.1431 | -2.8035 | -2.8094 |
0.6513 | 0.8959 | 5200 | 0.6693 | -0.2867 | -0.3444 | 0.5953 | 0.0577 | -97.6222 | -87.3824 | -2.7972 | -2.8031 |
0.6475 | 0.9132 | 5300 | 0.6692 | -0.2901 | -0.3484 | 0.5987 | 0.0583 | -98.0213 | -87.7248 | -2.7882 | -2.7943 |
0.6494 | 0.9304 | 5400 | 0.6687 | -0.2940 | -0.3536 | 0.6015 | 0.0596 | -98.5368 | -88.1090 | -2.7827 | -2.7887 |
0.6412 | 0.9476 | 5500 | 0.6682 | -0.3024 | -0.3635 | 0.5997 | 0.0610 | -99.5251 | -88.9533 | -2.7734 | -2.7794 |
0.6531 | 0.9649 | 5600 | 0.6680 | -0.2995 | -0.3610 | 0.6046 | 0.0615 | -99.2758 | -88.6585 | -2.7683 | -2.7743 |
0.652 | 0.9821 | 5700 | 0.6671 | -0.3121 | -0.3760 | 0.6041 | 0.0639 | -100.7801 | -89.9234 | -2.7604 | -2.7664 |
0.6355 | 0.9993 | 5800 | 0.6663 | -0.3272 | -0.3936 | 0.6057 | 0.0664 | -102.5409 | -91.4366 | -2.7489 | -2.7549 |
0.6362 | 1.0165 | 5900 | 0.6654 | -0.3504 | -0.4199 | 0.6043 | 0.0695 | -105.1658 | -93.7475 | -2.7329 | -2.7390 |
0.6587 | 1.0338 | 6000 | 0.6654 | -0.3453 | -0.4145 | 0.6076 | 0.0692 | -104.6326 | -93.2431 | -2.7260 | -2.7321 |
0.6337 | 1.0510 | 6100 | 0.6649 | -0.3492 | -0.4197 | 0.6078 | 0.0705 | -105.1470 | -93.6331 | -2.7177 | -2.7237 |
0.6372 | 1.0682 | 6200 | 0.6640 | -0.3675 | -0.4408 | 0.6090 | 0.0734 | -107.2651 | -95.4612 | -2.7083 | -2.7144 |
0.6555 | 1.0855 | 6300 | 0.6633 | -0.3808 | -0.4563 | 0.6111 | 0.0755 | -108.8140 | -96.7948 | -2.7009 | -2.7071 |
0.6406 | 1.1027 | 6400 | 0.6629 | -0.3843 | -0.4611 | 0.6108 | 0.0768 | -109.2905 | -97.1394 | -2.6941 | -2.7003 |
0.6445 | 1.1199 | 6500 | 0.6626 | -0.3894 | -0.4670 | 0.6097 | 0.0776 | -109.8768 | -97.6507 | -2.6860 | -2.6923 |
0.6438 | 1.1371 | 6600 | 0.6627 | -0.3907 | -0.4683 | 0.6073 | 0.0776 | -110.0129 | -97.7839 | -2.6814 | -2.6877 |
0.6411 | 1.1544 | 6700 | 0.6622 | -0.3996 | -0.4791 | 0.6122 | 0.0795 | -111.0866 | -98.6695 | -2.6729 | -2.6791 |
0.6224 | 1.1716 | 6800 | 0.6614 | -0.4163 | -0.4982 | 0.6115 | 0.0819 | -112.9988 | -100.3370 | -2.6625 | -2.6688 |
0.6437 | 1.1888 | 6900 | 0.6610 | -0.4232 | -0.5064 | 0.6106 | 0.0832 | -113.8220 | -101.0292 | -2.6554 | -2.6618 |
0.6268 | 1.2061 | 7000 | 0.6604 | -0.4419 | -0.5278 | 0.6090 | 0.0859 | -115.9616 | -102.9045 | -2.6490 | -2.6553 |
0.6303 | 1.2233 | 7100 | 0.6604 | -0.4379 | -0.5238 | 0.6129 | 0.0859 | -115.5604 | -102.5041 | -2.6443 | -2.6506 |
0.6251 | 1.2405 | 7200 | 0.6600 | -0.4437 | -0.5309 | 0.6101 | 0.0872 | -116.2726 | -103.0814 | -2.6383 | -2.6448 |
0.6531 | 1.2578 | 7300 | 0.6602 | -0.4339 | -0.5202 | 0.6125 | 0.0863 | -115.1998 | -102.0999 | -2.6366 | -2.6430 |
0.6456 | 1.2750 | 7400 | 0.6600 | -0.4313 | -0.5180 | 0.6125 | 0.0867 | -114.9813 | -101.8414 | -2.6345 | -2.6409 |
0.6455 | 1.2922 | 7500 | 0.6597 | -0.4307 | -0.5180 | 0.6148 | 0.0873 | -114.9807 | -101.7862 | -2.6292 | -2.6357 |
0.6762 | 1.3094 | 7600 | 0.6593 | -0.4392 | -0.5278 | 0.6118 | 0.0887 | -115.9649 | -102.6288 | -2.6216 | -2.6281 |
0.6365 | 1.3267 | 7700 | 0.6592 | -0.4402 | -0.5295 | 0.6157 | 0.0893 | -116.1288 | -102.7343 | -2.6172 | -2.6237 |
0.6211 | 1.3439 | 7800 | 0.6588 | -0.4484 | -0.5389 | 0.6194 | 0.0906 | -117.0741 | -103.5481 | -2.6115 | -2.6180 |
0.641 | 1.3611 | 7900 | 0.6581 | -0.4553 | -0.5479 | 0.6217 | 0.0926 | -117.9735 | -104.2409 | -2.6077 | -2.6143 |
0.6228 | 1.3784 | 8000 | 0.6578 | -0.4583 | -0.5520 | 0.6215 | 0.0937 | -118.3795 | -104.5455 | -2.6043 | -2.6109 |
0.641 | 1.3956 | 8100 | 0.6579 | -0.4658 | -0.5596 | 0.6178 | 0.0939 | -119.1444 | -105.2910 | -2.5997 | -2.6063 |
0.6504 | 1.4128 | 8200 | 0.6571 | -0.4707 | -0.5666 | 0.6213 | 0.0959 | -119.8413 | -105.7863 | -2.5974 | -2.6040 |
0.6472 | 1.4300 | 8300 | 0.6573 | -0.4661 | -0.5612 | 0.6217 | 0.0951 | -119.3045 | -105.3220 | -2.5953 | -2.6018 |
0.6298 | 1.4473 | 8400 | 0.6573 | -0.4609 | -0.5560 | 0.6206 | 0.0950 | -118.7768 | -104.8056 | -2.5928 | -2.5994 |
0.6207 | 1.4645 | 8500 | 0.6573 | -0.4579 | -0.5531 | 0.6180 | 0.0952 | -118.4887 | -104.5014 | -2.5885 | -2.5952 |
0.6661 | 1.4817 | 8600 | 0.6571 | -0.4639 | -0.5598 | 0.6204 | 0.0959 | -119.1632 | -105.1053 | -2.5846 | -2.5913 |
0.6475 | 1.4990 | 8700 | 0.6572 | -0.4570 | -0.5525 | 0.6190 | 0.0954 | -118.4251 | -104.4133 | -2.5846 | -2.5912 |
0.6476 | 1.5162 | 8800 | 0.6569 | -0.4604 | -0.5566 | 0.6194 | 0.0962 | -118.8439 | -104.7545 | -2.5816 | -2.5883 |
0.6336 | 1.5334 | 8900 | 0.6568 | -0.4692 | -0.5663 | 0.6190 | 0.0971 | -119.8081 | -105.6329 | -2.5772 | -2.5839 |
0.6282 | 1.5507 | 9000 | 0.6564 | -0.4708 | -0.5690 | 0.6187 | 0.0981 | -120.0761 | -105.7962 | -2.5754 | -2.5821 |
0.646 | 1.5679 | 9100 | 0.6565 | -0.4724 | -0.5704 | 0.6187 | 0.0980 | -120.2213 | -105.9529 | -2.5732 | -2.5799 |
0.6225 | 1.5851 | 9200 | 0.6563 | -0.4762 | -0.5749 | 0.6190 | 0.0987 | -120.6733 | -106.3303 | -2.5714 | -2.5781 |
0.6223 | 1.6023 | 9300 | 0.6562 | -0.4763 | -0.5753 | 0.6180 | 0.0990 | -120.7107 | -106.3383 | -2.5692 | -2.5759 |
0.6288 | 1.6196 | 9400 | 0.6559 | -0.4818 | -0.5819 | 0.6201 | 0.1001 | -121.3710 | -106.8921 | -2.5664 | -2.5731 |
0.6223 | 1.6368 | 9500 | 0.6557 | -0.4823 | -0.5828 | 0.6176 | 0.1005 | -121.4601 | -106.9374 | -2.5650 | -2.5717 |
0.6363 | 1.6540 | 9600 | 0.6556 | -0.4891 | -0.5902 | 0.6197 | 0.1011 | -122.2042 | -107.6243 | -2.5615 | -2.5683 |
0.6355 | 1.6713 | 9700 | 0.6556 | -0.4880 | -0.5892 | 0.6211 | 0.1012 | -122.1034 | -107.5130 | -2.5609 | -2.5677 |
0.6247 | 1.6885 | 9800 | 0.6555 | -0.4894 | -0.5910 | 0.6201 | 0.1015 | -122.2755 | -107.6543 | -2.5603 | -2.5670 |
0.5826 | 1.7057 | 9900 | 0.6554 | -0.4911 | -0.5929 | 0.6206 | 0.1019 | -122.4715 | -107.8182 | -2.5591 | -2.5659 |
0.6181 | 1.7229 | 10000 | 0.6553 | -0.4923 | -0.5945 | 0.6204 | 0.1022 | -122.6296 | -107.9373 | -2.5579 | -2.5647 |
0.6365 | 1.7402 | 10100 | 0.6553 | -0.4917 | -0.5938 | 0.6201 | 0.1022 | -122.5635 | -107.8778 | -2.5567 | -2.5635 |
0.6269 | 1.7574 | 10200 | 0.6552 | -0.4952 | -0.5977 | 0.6208 | 0.1025 | -122.9497 | -108.2321 | -2.5556 | -2.5624 |
0.6573 | 1.7746 | 10300 | 0.6553 | -0.4962 | -0.5988 | 0.6201 | 0.1026 | -123.0645 | -108.3347 | -2.5542 | -2.5610 |
0.6036 | 1.7919 | 10400 | 0.6552 | -0.4953 | -0.5980 | 0.6197 | 0.1027 | -122.9784 | -108.2400 | -2.5542 | -2.5610 |
0.6178 | 1.8091 | 10500 | 0.6549 | -0.4956 | -0.5990 | 0.6213 | 0.1034 | -123.0831 | -108.2757 | -2.5531 | -2.5598 |
0.6403 | 1.8263 | 10600 | 0.6551 | -0.4967 | -0.5996 | 0.6204 | 0.1030 | -123.1450 | -108.3809 | -2.5527 | -2.5594 |
0.6341 | 1.8436 | 10700 | 0.6550 | -0.4965 | -0.5997 | 0.6206 | 0.1032 | -123.1496 | -108.3595 | -2.5523 | -2.5590 |
0.627 | 1.8608 | 10800 | 0.6549 | -0.4971 | -0.6006 | 0.6211 | 0.1035 | -123.2409 | -108.4216 | -2.5521 | -2.5589 |
0.6335 | 1.8780 | 10900 | 0.6550 | -0.4974 | -0.6009 | 0.6201 | 0.1035 | -123.2728 | -108.4564 | -2.5523 | -2.5590 |
0.6262 | 1.8952 | 11000 | 0.6550 | -0.4971 | -0.6003 | 0.6201 | 0.1033 | -123.2126 | -108.4185 | -2.5520 | -2.5588 |
0.6311 | 1.9125 | 11100 | 0.6548 | -0.4971 | -0.6009 | 0.6211 | 0.1038 | -123.2688 | -108.4253 | -2.5521 | -2.5589 |
0.6239 | 1.9297 | 11200 | 0.6551 | -0.4971 | -0.6003 | 0.6201 | 0.1031 | -123.2061 | -108.4263 | -2.5516 | -2.5583 |
0.6629 | 1.9469 | 11300 | 0.6550 | -0.4970 | -0.6003 | 0.6206 | 0.1033 | -123.2066 | -108.4107 | -2.5518 | -2.5587 |
0.6308 | 1.9642 | 11400 | 0.6550 | -0.4972 | -0.6005 | 0.6197 | 0.1033 | -123.2305 | -108.4360 | -2.5518 | -2.5586 |
0.6532 | 1.9814 | 11500 | 0.6550 | -0.4972 | -0.6005 | 0.6197 | 0.1033 | -123.2317 | -108.4313 | -2.5517 | -2.5585 |
0.6257 | 1.9986 | 11600 | 0.6549 | -0.4976 | -0.6010 | 0.6194 | 0.1035 | -123.2810 | -108.4673 | -2.5516 | -2.5584 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 11
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.