tinyllama-1.1b-sum-dpo-full_LR2e-8_2epochs_old
This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:
- Loss: 0.6901
- Rewards/chosen: -0.0088
- Rewards/rejected: -0.0152
- Rewards/accuracies: 0.5892
- Rewards/margins: 0.0064
- Logps/rejected: -64.7009
- Logps/chosen: -59.5896
- Logits/rejected: -3.1105
- Logits/chosen: -3.1162
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-08
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6932 | 0.0172 | 100 | 0.6932 | 0.0001 | 0.0002 | 0.4947 | -0.0000 | -63.1650 | -58.7014 | -3.1574 | -3.1631 |
0.6932 | 0.0345 | 200 | 0.6932 | 0.0000 | 0.0002 | 0.4840 | -0.0001 | -63.1618 | -58.7076 | -3.1573 | -3.1630 |
0.6932 | 0.0517 | 300 | 0.6932 | -0.0000 | 0.0001 | 0.4842 | -0.0001 | -63.1720 | -58.7121 | -3.1574 | -3.1631 |
0.6933 | 0.0689 | 400 | 0.6932 | -0.0000 | 0.0000 | 0.4856 | -0.0000 | -63.1788 | -58.7153 | -3.1577 | -3.1633 |
0.693 | 0.0861 | 500 | 0.6932 | 0.0001 | 0.0002 | 0.4847 | -0.0001 | -63.1648 | -58.7040 | -3.1576 | -3.1633 |
0.6931 | 0.1034 | 600 | 0.6931 | 0.0000 | 0.0000 | 0.4961 | 0.0000 | -63.1795 | -58.7070 | -3.1572 | -3.1629 |
0.6932 | 0.1206 | 700 | 0.6932 | 0.0000 | 0.0001 | 0.4912 | -0.0001 | -63.1701 | -58.7077 | -3.1574 | -3.1631 |
0.693 | 0.1378 | 800 | 0.6932 | 0.0002 | 0.0002 | 0.4909 | -0.0000 | -63.1604 | -58.6950 | -3.1576 | -3.1633 |
0.6934 | 0.1551 | 900 | 0.6932 | 0.0001 | 0.0001 | 0.5060 | -0.0000 | -63.1695 | -58.7025 | -3.1576 | -3.1633 |
0.6932 | 0.1723 | 1000 | 0.6931 | 0.0001 | 0.0001 | 0.4949 | 0.0000 | -63.1684 | -58.6973 | -3.1574 | -3.1631 |
0.6931 | 0.1895 | 1100 | 0.6931 | 0.0003 | 0.0002 | 0.5156 | 0.0001 | -63.1597 | -58.6832 | -3.1571 | -3.1627 |
0.693 | 0.2068 | 1200 | 0.6931 | 0.0003 | 0.0003 | 0.5153 | 0.0001 | -63.1531 | -58.6773 | -3.1570 | -3.1627 |
0.693 | 0.2240 | 1300 | 0.6931 | 0.0004 | 0.0003 | 0.5174 | 0.0001 | -63.1508 | -58.6695 | -3.1567 | -3.1623 |
0.6928 | 0.2412 | 1400 | 0.6931 | 0.0005 | 0.0003 | 0.5130 | 0.0001 | -63.1467 | -58.6638 | -3.1566 | -3.1622 |
0.6927 | 0.2584 | 1500 | 0.6931 | 0.0005 | 0.0004 | 0.5207 | 0.0002 | -63.1443 | -58.6600 | -3.1564 | -3.1621 |
0.6928 | 0.2757 | 1600 | 0.6931 | 0.0006 | 0.0004 | 0.5156 | 0.0002 | -63.1352 | -58.6497 | -3.1559 | -3.1616 |
0.6928 | 0.2929 | 1700 | 0.6930 | 0.0008 | 0.0005 | 0.5290 | 0.0002 | -63.1288 | -58.6357 | -3.1553 | -3.1610 |
0.6923 | 0.3101 | 1800 | 0.6930 | 0.0008 | 0.0005 | 0.5395 | 0.0003 | -63.1303 | -58.6354 | -3.1550 | -3.1607 |
0.6924 | 0.3274 | 1900 | 0.6930 | 0.0008 | 0.0006 | 0.5223 | 0.0003 | -63.1249 | -58.6291 | -3.1546 | -3.1602 |
0.6925 | 0.3446 | 2000 | 0.6929 | 0.0009 | 0.0005 | 0.5423 | 0.0004 | -63.1319 | -58.6215 | -3.1544 | -3.1601 |
0.6922 | 0.3618 | 2100 | 0.6929 | 0.0011 | 0.0006 | 0.5511 | 0.0004 | -63.1153 | -58.6039 | -3.1539 | -3.1595 |
0.6917 | 0.3790 | 2200 | 0.6929 | 0.0012 | 0.0006 | 0.5379 | 0.0005 | -63.1153 | -58.5967 | -3.1533 | -3.1590 |
0.6914 | 0.3963 | 2300 | 0.6928 | 0.0013 | 0.0007 | 0.5481 | 0.0006 | -63.1094 | -58.5806 | -3.1531 | -3.1587 |
0.6921 | 0.4135 | 2400 | 0.6928 | 0.0013 | 0.0007 | 0.5500 | 0.0007 | -63.1136 | -58.5781 | -3.1524 | -3.1579 |
0.6922 | 0.4307 | 2500 | 0.6928 | 0.0015 | 0.0007 | 0.5602 | 0.0008 | -63.1131 | -58.5648 | -3.1518 | -3.1575 |
0.6909 | 0.4480 | 2600 | 0.6927 | 0.0016 | 0.0007 | 0.5581 | 0.0009 | -63.1079 | -58.5517 | -3.1512 | -3.1568 |
0.6911 | 0.4652 | 2700 | 0.6927 | 0.0016 | 0.0007 | 0.5627 | 0.0009 | -63.1136 | -58.5521 | -3.1505 | -3.1562 |
0.6917 | 0.4824 | 2800 | 0.6927 | 0.0017 | 0.0008 | 0.5507 | 0.0010 | -63.1044 | -58.5383 | -3.1503 | -3.1559 |
0.6919 | 0.4997 | 2900 | 0.6926 | 0.0017 | 0.0006 | 0.5609 | 0.0011 | -63.1181 | -58.5391 | -3.1496 | -3.1552 |
0.6918 | 0.5169 | 3000 | 0.6925 | 0.0019 | 0.0006 | 0.5606 | 0.0013 | -63.1217 | -58.5262 | -3.1488 | -3.1544 |
0.691 | 0.5341 | 3100 | 0.6925 | 0.0019 | 0.0005 | 0.5669 | 0.0014 | -63.1269 | -58.5219 | -3.1485 | -3.1542 |
0.692 | 0.5513 | 3200 | 0.6925 | 0.0019 | 0.0005 | 0.5606 | 0.0014 | -63.1309 | -58.5268 | -3.1477 | -3.1533 |
0.6902 | 0.5686 | 3300 | 0.6924 | 0.0018 | 0.0003 | 0.5604 | 0.0016 | -63.1528 | -58.5277 | -3.1470 | -3.1526 |
0.6898 | 0.5858 | 3400 | 0.6923 | 0.0020 | 0.0003 | 0.5602 | 0.0017 | -63.1520 | -58.5135 | -3.1462 | -3.1518 |
0.6902 | 0.6030 | 3500 | 0.6923 | 0.0019 | 0.0001 | 0.5532 | 0.0018 | -63.1674 | -58.5221 | -3.1455 | -3.1511 |
0.6905 | 0.6203 | 3600 | 0.6923 | 0.0018 | -0.0000 | 0.5697 | 0.0018 | -63.1817 | -58.5294 | -3.1446 | -3.1502 |
0.6877 | 0.6375 | 3700 | 0.6922 | 0.0019 | -0.0000 | 0.5741 | 0.0020 | -63.1849 | -58.5181 | -3.1438 | -3.1494 |
0.691 | 0.6547 | 3800 | 0.6921 | 0.0019 | -0.0001 | 0.5676 | 0.0021 | -63.1942 | -58.5193 | -3.1430 | -3.1486 |
0.6881 | 0.6720 | 3900 | 0.6921 | 0.0018 | -0.0004 | 0.5639 | 0.0022 | -63.2183 | -58.5326 | -3.1420 | -3.1476 |
0.6891 | 0.6892 | 4000 | 0.6920 | 0.0018 | -0.0006 | 0.5727 | 0.0023 | -63.2358 | -58.5348 | -3.1408 | -3.1465 |
0.688 | 0.7064 | 4100 | 0.6920 | 0.0018 | -0.0007 | 0.5695 | 0.0025 | -63.2489 | -58.5334 | -3.1397 | -3.1453 |
0.6893 | 0.7236 | 4200 | 0.6920 | 0.0015 | -0.0009 | 0.5685 | 0.0025 | -63.2735 | -58.5574 | -3.1390 | -3.1446 |
0.6897 | 0.7409 | 4300 | 0.6919 | 0.0015 | -0.0012 | 0.5748 | 0.0027 | -63.2966 | -58.5608 | -3.1383 | -3.1439 |
0.6904 | 0.7581 | 4400 | 0.6918 | 0.0012 | -0.0016 | 0.5711 | 0.0028 | -63.3356 | -58.5872 | -3.1374 | -3.1430 |
0.6905 | 0.7753 | 4500 | 0.6918 | 0.0013 | -0.0016 | 0.5850 | 0.0029 | -63.3426 | -58.5858 | -3.1369 | -3.1425 |
0.6883 | 0.7926 | 4600 | 0.6917 | 0.0011 | -0.0019 | 0.5788 | 0.0029 | -63.3659 | -58.6051 | -3.1357 | -3.1413 |
0.6897 | 0.8098 | 4700 | 0.6916 | 0.0010 | -0.0021 | 0.5741 | 0.0031 | -63.3948 | -58.6130 | -3.1353 | -3.1409 |
0.6905 | 0.8270 | 4800 | 0.6916 | 0.0008 | -0.0024 | 0.5748 | 0.0032 | -63.4159 | -58.6317 | -3.1348 | -3.1404 |
0.6875 | 0.8442 | 4900 | 0.6916 | 0.0005 | -0.0028 | 0.5774 | 0.0033 | -63.4563 | -58.6580 | -3.1340 | -3.1396 |
0.6899 | 0.8615 | 5000 | 0.6915 | 0.0005 | -0.0029 | 0.5769 | 0.0033 | -63.4652 | -58.6640 | -3.1327 | -3.1384 |
0.6864 | 0.8787 | 5100 | 0.6915 | 0.0003 | -0.0031 | 0.5683 | 0.0034 | -63.4888 | -58.6839 | -3.1319 | -3.1375 |
0.6865 | 0.8959 | 5200 | 0.6914 | 0.0001 | -0.0035 | 0.5734 | 0.0036 | -63.5340 | -58.7065 | -3.1314 | -3.1371 |
0.6877 | 0.9132 | 5300 | 0.6913 | -0.0001 | -0.0039 | 0.5737 | 0.0038 | -63.5667 | -58.7197 | -3.1309 | -3.1365 |
0.6889 | 0.9304 | 5400 | 0.6913 | -0.0003 | -0.0042 | 0.5760 | 0.0039 | -63.5960 | -58.7374 | -3.1301 | -3.1358 |
0.688 | 0.9476 | 5500 | 0.6913 | -0.0004 | -0.0043 | 0.5660 | 0.0039 | -63.6131 | -58.7516 | -3.1294 | -3.1351 |
0.6899 | 0.9649 | 5600 | 0.6913 | -0.0006 | -0.0045 | 0.5746 | 0.0039 | -63.6304 | -58.7708 | -3.1287 | -3.1343 |
0.687 | 0.9821 | 5700 | 0.6911 | -0.0006 | -0.0048 | 0.5788 | 0.0042 | -63.6628 | -58.7723 | -3.1281 | -3.1337 |
0.6857 | 0.9993 | 5800 | 0.6911 | -0.0009 | -0.0051 | 0.5713 | 0.0042 | -63.6879 | -58.7999 | -3.1278 | -3.1334 |
0.6864 | 1.0165 | 5900 | 0.6911 | -0.0012 | -0.0055 | 0.5788 | 0.0044 | -63.7349 | -58.8299 | -3.1271 | -3.1327 |
0.6888 | 1.0338 | 6000 | 0.6910 | -0.0014 | -0.0059 | 0.5790 | 0.0044 | -63.7658 | -58.8540 | -3.1259 | -3.1316 |
0.6857 | 1.0510 | 6100 | 0.6909 | -0.0016 | -0.0062 | 0.5795 | 0.0046 | -63.8031 | -58.8730 | -3.1255 | -3.1312 |
0.6889 | 1.0682 | 6200 | 0.6909 | -0.0019 | -0.0066 | 0.5764 | 0.0047 | -63.8376 | -58.9032 | -3.1248 | -3.1305 |
0.6865 | 1.0855 | 6300 | 0.6908 | -0.0022 | -0.0070 | 0.5788 | 0.0048 | -63.8796 | -58.9275 | -3.1245 | -3.1303 |
0.6884 | 1.1027 | 6400 | 0.6909 | -0.0024 | -0.0071 | 0.5748 | 0.0047 | -63.8941 | -58.9523 | -3.1230 | -3.1287 |
0.6893 | 1.1199 | 6500 | 0.6908 | -0.0026 | -0.0075 | 0.5813 | 0.0049 | -63.9268 | -58.9676 | -3.1230 | -3.1287 |
0.6886 | 1.1371 | 6600 | 0.6908 | -0.0030 | -0.0079 | 0.5748 | 0.0050 | -63.9723 | -59.0090 | -3.1216 | -3.1273 |
0.6865 | 1.1544 | 6700 | 0.6908 | -0.0032 | -0.0082 | 0.5804 | 0.0050 | -64.0010 | -59.0346 | -3.1218 | -3.1275 |
0.6868 | 1.1716 | 6800 | 0.6907 | -0.0033 | -0.0084 | 0.5836 | 0.0051 | -64.0239 | -59.0461 | -3.1204 | -3.1261 |
0.6882 | 1.1888 | 6900 | 0.6907 | -0.0037 | -0.0089 | 0.5811 | 0.0051 | -64.0668 | -59.0845 | -3.1198 | -3.1255 |
0.6859 | 1.2061 | 7000 | 0.6907 | -0.0041 | -0.0093 | 0.5797 | 0.0052 | -64.1093 | -59.1233 | -3.1204 | -3.1261 |
0.685 | 1.2233 | 7100 | 0.6906 | -0.0045 | -0.0098 | 0.5797 | 0.0053 | -64.1565 | -59.1598 | -3.1180 | -3.1237 |
0.6858 | 1.2405 | 7200 | 0.6905 | -0.0046 | -0.0101 | 0.5820 | 0.0055 | -64.1910 | -59.1702 | -3.1184 | -3.1241 |
0.6905 | 1.2578 | 7300 | 0.6905 | -0.0049 | -0.0104 | 0.5804 | 0.0055 | -64.2204 | -59.2016 | -3.1182 | -3.1239 |
0.6852 | 1.2750 | 7400 | 0.6906 | -0.0051 | -0.0106 | 0.5790 | 0.0055 | -64.2432 | -59.2260 | -3.1180 | -3.1237 |
0.6873 | 1.2922 | 7500 | 0.6906 | -0.0055 | -0.0109 | 0.5860 | 0.0055 | -64.2745 | -59.2600 | -3.1174 | -3.1231 |
0.6871 | 1.3094 | 7600 | 0.6905 | -0.0055 | -0.0112 | 0.5829 | 0.0057 | -64.3001 | -59.2643 | -3.1166 | -3.1223 |
0.6865 | 1.3267 | 7700 | 0.6904 | -0.0057 | -0.0115 | 0.5846 | 0.0058 | -64.3291 | -59.2850 | -3.1161 | -3.1218 |
0.6888 | 1.3439 | 7800 | 0.6905 | -0.0061 | -0.0118 | 0.5820 | 0.0057 | -64.3590 | -59.3192 | -3.1161 | -3.1218 |
0.6868 | 1.3611 | 7900 | 0.6904 | -0.0062 | -0.0121 | 0.5846 | 0.0058 | -64.3857 | -59.3334 | -3.1164 | -3.1220 |
0.6876 | 1.3784 | 8000 | 0.6903 | -0.0063 | -0.0123 | 0.5839 | 0.0060 | -64.4065 | -59.3406 | -3.1148 | -3.1204 |
0.688 | 1.3956 | 8100 | 0.6904 | -0.0066 | -0.0125 | 0.5832 | 0.0059 | -64.4252 | -59.3670 | -3.1144 | -3.1201 |
0.6858 | 1.4128 | 8200 | 0.6903 | -0.0068 | -0.0127 | 0.5781 | 0.0059 | -64.4505 | -59.3885 | -3.1140 | -3.1197 |
0.6836 | 1.4300 | 8300 | 0.6904 | -0.0069 | -0.0129 | 0.5822 | 0.0059 | -64.4660 | -59.4050 | -3.1139 | -3.1195 |
0.6863 | 1.4473 | 8400 | 0.6903 | -0.0071 | -0.0132 | 0.5829 | 0.0061 | -64.4968 | -59.4218 | -3.1146 | -3.1203 |
0.6847 | 1.4645 | 8500 | 0.6903 | -0.0073 | -0.0133 | 0.5871 | 0.0060 | -64.5110 | -59.4395 | -3.1132 | -3.1189 |
0.6861 | 1.4817 | 8600 | 0.6903 | -0.0075 | -0.0136 | 0.5864 | 0.0061 | -64.5362 | -59.4577 | -3.1135 | -3.1192 |
0.6847 | 1.4990 | 8700 | 0.6903 | -0.0077 | -0.0138 | 0.5843 | 0.0061 | -64.5599 | -59.4786 | -3.1127 | -3.1184 |
0.6866 | 1.5162 | 8800 | 0.6902 | -0.0077 | -0.0139 | 0.5878 | 0.0062 | -64.5684 | -59.4835 | -3.1125 | -3.1182 |
0.6841 | 1.5334 | 8900 | 0.6902 | -0.0079 | -0.0141 | 0.5874 | 0.0062 | -64.5873 | -59.4978 | -3.1129 | -3.1186 |
0.6799 | 1.5507 | 9000 | 0.6902 | -0.0080 | -0.0142 | 0.5857 | 0.0062 | -64.6045 | -59.5160 | -3.1124 | -3.1181 |
0.6832 | 1.5679 | 9100 | 0.6902 | -0.0080 | -0.0143 | 0.5862 | 0.0062 | -64.6061 | -59.5157 | -3.1117 | -3.1174 |
0.6846 | 1.5851 | 9200 | 0.6903 | -0.0083 | -0.0144 | 0.5811 | 0.0062 | -64.6246 | -59.5410 | -3.1116 | -3.1173 |
0.6853 | 1.6023 | 9300 | 0.6902 | -0.0083 | -0.0146 | 0.5827 | 0.0062 | -64.6375 | -59.5467 | -3.1120 | -3.1177 |
0.6882 | 1.6196 | 9400 | 0.6902 | -0.0084 | -0.0147 | 0.5885 | 0.0063 | -64.6528 | -59.5515 | -3.1111 | -3.1169 |
0.6867 | 1.6368 | 9500 | 0.6902 | -0.0084 | -0.0147 | 0.5816 | 0.0063 | -64.6481 | -59.5528 | -3.1110 | -3.1166 |
0.6845 | 1.6540 | 9600 | 0.6902 | -0.0085 | -0.0148 | 0.5862 | 0.0064 | -64.6648 | -59.5611 | -3.1109 | -3.1166 |
0.6855 | 1.6713 | 9700 | 0.6902 | -0.0085 | -0.0149 | 0.5876 | 0.0063 | -64.6676 | -59.5646 | -3.1111 | -3.1167 |
0.682 | 1.6885 | 9800 | 0.6902 | -0.0087 | -0.0150 | 0.5867 | 0.0063 | -64.6765 | -59.5814 | -3.1108 | -3.1164 |
0.6814 | 1.7057 | 9900 | 0.6902 | -0.0087 | -0.0150 | 0.5913 | 0.0063 | -64.6813 | -59.5806 | -3.1108 | -3.1165 |
0.6837 | 1.7229 | 10000 | 0.6901 | -0.0087 | -0.0151 | 0.5927 | 0.0064 | -64.6926 | -59.5854 | -3.1107 | -3.1163 |
0.6821 | 1.7402 | 10100 | 0.6901 | -0.0087 | -0.0151 | 0.5841 | 0.0064 | -64.6931 | -59.5801 | -3.1105 | -3.1162 |
0.6867 | 1.7574 | 10200 | 0.6902 | -0.0089 | -0.0152 | 0.5816 | 0.0064 | -64.7032 | -59.5971 | -3.1105 | -3.1162 |
0.6867 | 1.7746 | 10300 | 0.6901 | -0.0088 | -0.0152 | 0.5871 | 0.0064 | -64.6972 | -59.5881 | -3.1104 | -3.1161 |
0.6847 | 1.7919 | 10400 | 0.6902 | -0.0089 | -0.0151 | 0.5869 | 0.0062 | -64.6896 | -59.5992 | -3.1102 | -3.1159 |
0.6861 | 1.8091 | 10500 | 0.6901 | -0.0088 | -0.0152 | 0.5862 | 0.0064 | -64.7046 | -59.5936 | -3.1104 | -3.1161 |
0.6877 | 1.8263 | 10600 | 0.6901 | -0.0088 | -0.0153 | 0.5920 | 0.0064 | -64.7073 | -59.5967 | -3.1104 | -3.1161 |
0.6824 | 1.8436 | 10700 | 0.6901 | -0.0089 | -0.0153 | 0.5867 | 0.0064 | -64.7092 | -59.5998 | -3.1103 | -3.1160 |
0.6839 | 1.8608 | 10800 | 0.6901 | -0.0089 | -0.0153 | 0.5878 | 0.0064 | -64.7113 | -59.5983 | -3.1102 | -3.1158 |
0.6831 | 1.8780 | 10900 | 0.6901 | -0.0089 | -0.0153 | 0.5846 | 0.0064 | -64.7147 | -59.6028 | -3.1104 | -3.1160 |
0.6886 | 1.8952 | 11000 | 0.6901 | -0.0089 | -0.0154 | 0.5908 | 0.0064 | -64.7155 | -59.6032 | -3.1103 | -3.1160 |
0.6859 | 1.9125 | 11100 | 0.6901 | -0.0088 | -0.0152 | 0.5846 | 0.0064 | -64.7015 | -59.5919 | -3.1102 | -3.1159 |
0.685 | 1.9297 | 11200 | 0.6902 | -0.0088 | -0.0152 | 0.5846 | 0.0064 | -64.6997 | -59.5930 | -3.1103 | -3.1160 |
0.6869 | 1.9469 | 11300 | 0.6901 | -0.0089 | -0.0153 | 0.5876 | 0.0064 | -64.7081 | -59.5984 | -3.1104 | -3.1161 |
0.6864 | 1.9642 | 11400 | 0.6901 | -0.0088 | -0.0152 | 0.5908 | 0.0064 | -64.6952 | -59.5885 | -3.1105 | -3.1161 |
0.689 | 1.9814 | 11500 | 0.6902 | -0.0089 | -0.0153 | 0.5820 | 0.0064 | -64.7117 | -59.6064 | -3.1105 | -3.1161 |
0.6865 | 1.9986 | 11600 | 0.6901 | -0.0088 | -0.0152 | 0.5892 | 0.0064 | -64.7009 | -59.5896 | -3.1105 | -3.1162 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.