tinyllama-1.1b-sum-dpo-full_LR2e-8_3epochs_old
This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:
- Loss: 0.6872
- Rewards/chosen: -0.0423
- Rewards/rejected: -0.0554
- Rewards/accuracies: 0.5943
- Rewards/margins: 0.0131
- Logps/rejected: -68.7246
- Logps/chosen: -62.9434
- Logits/rejected: -3.0648
- Logits/chosen: -3.0704
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-08
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6931 | 0.0172 | 100 | -3.1635 | -3.1579 | -58.7134 | -63.1764 | 0.6932 | 0.4868 | -0.0000 | -0.0001 | 0.0000 |
0.6931 | 0.0345 | 200 | -3.1635 | -3.1578 | -58.7115 | -63.1768 | 0.6932 | 0.4823 | 0.0000 | -0.0000 | 0.0000 |
0.6932 | 0.0517 | 300 | -3.1633 | -3.1576 | -58.7154 | -63.1759 | 0.6932 | 0.4777 | -0.0000 | -0.0001 | 0.0000 |
0.6931 | 0.0689 | 400 | -3.1632 | -3.1575 | -58.7141 | -63.1718 | 0.6932 | 0.4954 | -0.0000 | -0.0001 | 0.0001 |
0.6932 | 0.0861 | 500 | -3.1630 | -3.1574 | -58.7011 | -63.1630 | 0.6932 | 0.4735 | 0.0001 | -0.0001 | 0.0002 |
0.6932 | 0.1034 | 600 | -3.1630 | -3.1572 | -58.7157 | -63.1771 | 0.6932 | 0.4893 | -0.0000 | -0.0001 | 0.0000 |
0.6931 | 0.1206 | 700 | -3.1632 | -3.1575 | -58.7038 | -63.1662 | 0.6932 | 0.4937 | 0.0001 | -0.0001 | 0.0001 |
0.693 | 0.1378 | 800 | -3.1633 | -3.1576 | -58.7029 | -63.1637 | 0.6932 | 0.4826 | 0.0001 | -0.0001 | 0.0002 |
0.6933 | 0.1551 | 900 | -3.1632 | -3.1575 | -58.7115 | -63.1671 | 0.6932 | 0.4921 | 0.0000 | -0.0001 | 0.0001 |
0.6932 | 0.1723 | 1000 | -3.1633 | -3.1577 | -58.7041 | -63.1666 | 0.6932 | 0.4823 | 0.0001 | -0.0001 | 0.0001 |
0.6933 | 0.1895 | 1100 | -3.1628 | -3.1572 | -58.7041 | -63.1678 | 0.6932 | 0.4879 | 0.0001 | -0.0000 | 0.0001 |
0.6931 | 0.2068 | 1200 | -3.1629 | -3.1573 | -58.6888 | -63.1590 | 0.6931 | 0.4984 | 0.0002 | 0.0000 | 0.0002 |
0.6931 | 0.2240 | 1300 | -3.1629 | -3.1572 | -58.6903 | -63.1651 | 0.6931 | 0.5123 | 0.0002 | 0.0001 | 0.0002 |
0.693 | 0.2412 | 1400 | -3.1628 | -3.1571 | -58.6913 | -63.1595 | 0.6931 | 0.5112 | 0.0002 | -0.0000 | 0.0002 |
0.693 | 0.2584 | 1500 | -3.1625 | -3.1569 | -58.6790 | -63.1508 | 0.6931 | 0.5116 | 0.0003 | 0.0000 | 0.0003 |
0.6928 | 0.2757 | 1600 | -3.1625 | -3.1569 | -58.6717 | -63.1467 | 0.6931 | 0.4888 | 0.0004 | 0.0001 | 0.0003 |
0.6928 | 0.2929 | 1700 | -3.1622 | -3.1565 | -58.6647 | -63.1465 | 0.6931 | 0.5074 | 0.0005 | 0.0001 | 0.0003 |
0.6927 | 0.3101 | 1800 | -3.1617 | -3.1561 | -58.6570 | -63.1432 | 0.6931 | 0.5197 | 0.0005 | 0.0002 | 0.0004 |
0.6927 | 0.3274 | 1900 | -3.1615 | -3.1559 | -58.6514 | -63.1370 | 0.6931 | 0.5260 | 0.0006 | 0.0002 | 0.0004 |
0.6927 | 0.3446 | 2000 | -3.1611 | -3.1555 | -58.6395 | -63.1397 | 0.6930 | 0.5448 | 0.0007 | 0.0003 | 0.0004 |
0.6926 | 0.3618 | 2100 | -3.1606 | -3.1550 | -58.6333 | -63.1319 | 0.6930 | 0.5344 | 0.0008 | 0.0003 | 0.0005 |
0.692 | 0.3790 | 2200 | -3.1602 | -3.1545 | -58.6257 | -63.1267 | 0.6930 | 0.5295 | 0.0009 | 0.0003 | 0.0005 |
0.692 | 0.3963 | 2300 | -3.1600 | -3.1543 | -58.6131 | -63.1255 | 0.6929 | 0.5437 | 0.0010 | 0.0004 | 0.0005 |
0.6924 | 0.4135 | 2400 | -3.1597 | -3.1540 | -58.6085 | -63.1220 | 0.6929 | 0.5407 | 0.0010 | 0.0005 | 0.0006 |
0.6923 | 0.4307 | 2500 | -3.1590 | -3.1533 | -58.5965 | -63.1095 | 0.6929 | 0.5344 | 0.0012 | 0.0004 | 0.0007 |
0.6914 | 0.4480 | 2600 | -3.1586 | -3.1529 | -58.5858 | -63.1145 | 0.6928 | 0.5520 | 0.0013 | 0.0006 | 0.0007 |
0.6916 | 0.4652 | 2700 | -3.1581 | -3.1524 | -58.5732 | -63.1112 | 0.6928 | 0.5500 | 0.0014 | 0.0007 | 0.0007 |
0.692 | 0.4824 | 2800 | -3.1575 | -3.1518 | -58.5713 | -63.1121 | 0.6928 | 0.5504 | 0.0014 | 0.0007 | 0.0007 |
0.6921 | 0.4997 | 2900 | -3.1570 | -3.1514 | -58.5498 | -63.1157 | 0.6927 | 0.5548 | 0.0016 | 0.0010 | 0.0006 |
0.692 | 0.5169 | 3000 | -3.1564 | -3.1508 | -58.5515 | -63.1176 | 0.6927 | 0.5609 | 0.0016 | 0.0010 | 0.0006 |
0.6912 | 0.5341 | 3100 | -3.1558 | -3.1502 | -58.5477 | -63.1245 | 0.6926 | 0.5500 | 0.0016 | 0.0011 | 0.0006 |
0.6922 | 0.5513 | 3200 | -3.1553 | -3.1497 | -58.5271 | -63.1111 | 0.6926 | 0.5616 | 0.0018 | 0.0012 | 0.0007 |
0.6907 | 0.5686 | 3300 | -3.1541 | -3.1485 | -58.5238 | -63.1269 | 0.6925 | 0.5627 | 0.0019 | 0.0013 | 0.0005 |
0.6902 | 0.5858 | 3400 | -3.1535 | -3.1479 | -58.5243 | -63.1298 | 0.6925 | 0.5527 | 0.0019 | 0.0014 | 0.0005 |
0.6907 | 0.6030 | 3500 | -3.1527 | -3.1471 | -58.5080 | -63.1429 | 0.6923 | 0.5620 | 0.0020 | 0.0017 | 0.0004 |
0.6908 | 0.6203 | 3600 | -3.1519 | -3.1462 | -58.5280 | -63.1556 | 0.6924 | 0.5676 | 0.0018 | 0.0016 | 0.0002 |
0.6882 | 0.6375 | 3700 | -3.1510 | -3.1454 | -58.5192 | -63.1647 | 0.6923 | 0.5737 | 0.0019 | 0.0018 | 0.0002 |
0.6912 | 0.6547 | 3800 | -3.1502 | -3.1446 | -58.5257 | -63.1704 | 0.6923 | 0.5583 | 0.0019 | 0.0018 | 0.0001 |
0.6884 | 0.6720 | 3900 | -3.1495 | -3.1439 | -58.5325 | -63.1916 | 0.6922 | 0.5653 | 0.0018 | 0.0019 | -0.0001 |
0.6893 | 0.6892 | 4000 | -3.1483 | -3.1427 | -58.5317 | -63.2120 | 0.6921 | 0.5748 | 0.0018 | 0.0021 | -0.0003 |
0.6882 | 0.7064 | 4100 | -3.1470 | -3.1414 | -58.5366 | -63.2363 | 0.6920 | 0.5767 | 0.0018 | 0.0023 | -0.0006 |
0.6897 | 0.7236 | 4200 | -3.1459 | -3.1403 | -58.5300 | -63.2312 | 0.6920 | 0.5683 | 0.0018 | 0.0023 | -0.0005 |
0.6898 | 0.7409 | 4300 | -3.1451 | -3.1394 | -58.5430 | -63.2659 | 0.6919 | 0.5743 | 0.0017 | 0.0025 | -0.0009 |
0.6907 | 0.7581 | 4400 | -3.1442 | -3.1386 | -58.5652 | -63.2958 | 0.6919 | 0.5718 | 0.0015 | 0.0026 | -0.0012 |
0.6907 | 0.7753 | 4500 | -3.1433 | -3.1376 | -58.5910 | -63.3274 | 0.6919 | 0.5685 | 0.0012 | 0.0027 | -0.0015 |
0.6884 | 0.7926 | 4600 | -3.1425 | -3.1369 | -58.5812 | -63.3476 | 0.6917 | 0.5829 | 0.0013 | 0.0030 | -0.0017 |
0.6899 | 0.8098 | 4700 | -3.1417 | -3.1360 | -58.6116 | -63.3823 | 0.6917 | 0.5723 | 0.0010 | 0.0030 | -0.0020 |
0.6905 | 0.8270 | 4800 | -3.1404 | -3.1348 | -58.6264 | -63.4076 | 0.6916 | 0.5755 | 0.0009 | 0.0031 | -0.0023 |
0.6874 | 0.8442 | 4900 | -3.1397 | -3.1340 | -58.6553 | -63.4466 | 0.6916 | 0.5750 | 0.0006 | 0.0032 | -0.0027 |
0.69 | 0.8615 | 5000 | -3.1387 | -3.1330 | -58.6609 | -63.4651 | 0.6915 | 0.5727 | 0.0005 | 0.0034 | -0.0028 |
0.6862 | 0.8787 | 5100 | -3.1378 | -3.1321 | -58.6905 | -63.5114 | 0.6915 | 0.5720 | 0.0002 | 0.0035 | -0.0033 |
0.6863 | 0.8959 | 5200 | -3.1369 | -3.1312 | -58.7184 | -63.5646 | 0.6913 | 0.5771 | -0.0001 | 0.0038 | -0.0038 |
0.6876 | 0.9132 | 5300 | -3.1359 | -3.1303 | -58.7430 | -63.6003 | 0.6913 | 0.5704 | -0.0003 | 0.0039 | -0.0042 |
0.6888 | 0.9304 | 5400 | -3.1349 | -3.1292 | -58.7667 | -63.6265 | 0.6913 | 0.5804 | -0.0005 | 0.0039 | -0.0045 |
0.6877 | 0.9476 | 5500 | -3.1341 | -3.1284 | -58.7894 | -63.6709 | 0.6912 | 0.5732 | -0.0008 | 0.0041 | -0.0049 |
0.6899 | 0.9649 | 5600 | -3.1337 | -3.1280 | -58.8098 | -63.7023 | 0.6911 | 0.5755 | -0.0010 | 0.0042 | -0.0052 |
0.6868 | 0.9821 | 5700 | -3.1323 | -3.1267 | -58.8416 | -63.7534 | 0.6910 | 0.5762 | -0.0013 | 0.0044 | -0.0057 |
0.6854 | 0.9993 | 5800 | -3.1315 | -3.1258 | -58.8782 | -63.7916 | 0.6910 | 0.5748 | -0.0017 | 0.0045 | -0.0061 |
0.6859 | 1.0165 | 5900 | -3.1300 | -3.1244 | -58.9206 | -63.8560 | 0.6909 | 0.5781 | -0.0021 | 0.0047 | -0.0068 |
0.6885 | 1.0338 | 6000 | -3.1292 | -3.1235 | -58.9602 | -63.9056 | 0.6909 | 0.5799 | -0.0025 | 0.0048 | -0.0073 |
0.6853 | 1.0510 | 6100 | -3.1272 | -3.1215 | -58.9989 | -63.9666 | 0.6908 | 0.5795 | -0.0029 | 0.0050 | -0.0079 |
0.6886 | 1.0682 | 6200 | -3.1262 | -3.1205 | -59.0433 | -64.0141 | 0.6908 | 0.5820 | -0.0033 | 0.0050 | -0.0083 |
0.6859 | 1.0855 | 6300 | -3.1255 | -3.1199 | -59.0883 | -64.0800 | 0.6907 | 0.5846 | -0.0038 | 0.0052 | -0.0090 |
0.6879 | 1.1027 | 6400 | -3.1255 | -3.1198 | -59.1255 | -64.1257 | 0.6906 | 0.5809 | -0.0041 | 0.0053 | -0.0095 |
0.6889 | 1.1199 | 6500 | -3.1235 | -3.1178 | -59.1877 | -64.2032 | 0.6906 | 0.5804 | -0.0048 | 0.0055 | -0.0102 |
0.6881 | 1.1371 | 6600 | -3.1233 | -3.1176 | -59.2408 | -64.2806 | 0.6904 | 0.5871 | -0.0053 | 0.0057 | -0.0110 |
0.6856 | 1.1544 | 6700 | -3.1209 | -3.1152 | -59.2896 | -64.3343 | 0.6904 | 0.5792 | -0.0058 | 0.0058 | -0.0115 |
0.6856 | 1.1716 | 6800 | -3.1199 | -3.1142 | -59.3530 | -64.4170 | 0.6903 | 0.5857 | -0.0064 | 0.0060 | -0.0124 |
0.6876 | 1.1888 | 6900 | -3.1196 | -3.1140 | -59.4390 | -64.4999 | 0.6904 | 0.5846 | -0.0073 | 0.0059 | -0.0132 |
0.6846 | 1.2061 | 7000 | -3.1178 | -3.1121 | -59.5252 | -64.6186 | 0.6902 | 0.5897 | -0.0081 | 0.0063 | -0.0144 |
0.6837 | 1.2233 | 7100 | -3.1168 | -3.1111 | -59.5873 | -64.6919 | 0.6902 | 0.5864 | -0.0088 | 0.0064 | -0.0151 |
0.6846 | 1.2405 | 7200 | -3.1156 | -3.1099 | -59.6617 | -64.7766 | 0.6901 | 0.5825 | -0.0095 | 0.0065 | -0.0160 |
0.6901 | 1.2578 | 7300 | -3.1146 | -3.1089 | -59.7055 | -64.8355 | 0.6901 | 0.5811 | -0.0099 | 0.0066 | -0.0166 |
0.6842 | 1.2750 | 7400 | -3.1143 | -3.1086 | -59.7743 | -64.9195 | 0.6900 | 0.5785 | -0.0106 | 0.0068 | -0.0174 |
0.6862 | 1.2922 | 7500 | -3.1130 | -3.1073 | -59.8530 | -65.0033 | 0.6900 | 0.5778 | -0.0114 | 0.0068 | -0.0182 |
0.6862 | 1.3094 | 7600 | -3.1114 | -3.1057 | -59.9138 | -65.0788 | 0.6899 | 0.5809 | -0.0120 | 0.0070 | -0.0190 |
0.6848 | 1.3267 | 7700 | -3.1101 | -3.1044 | -59.9958 | -65.1799 | 0.6898 | 0.5829 | -0.0128 | 0.0072 | -0.0200 |
0.6874 | 1.3439 | 7800 | -3.1095 | -3.1038 | -60.0529 | -65.2523 | 0.6897 | 0.5818 | -0.0134 | 0.0073 | -0.0207 |
0.6863 | 1.3611 | 7900 | -3.1087 | -3.1030 | -60.1189 | -65.3403 | 0.6896 | 0.5825 | -0.0141 | 0.0075 | -0.0216 |
0.6861 | 1.3784 | 8000 | -3.1078 | -3.1021 | -60.2116 | -65.4348 | 0.6896 | 0.5836 | -0.0150 | 0.0075 | -0.0225 |
0.6868 | 1.3956 | 8100 | -3.1069 | -3.1012 | -60.2774 | -65.5086 | 0.6896 | 0.5843 | -0.0157 | 0.0076 | -0.0233 |
0.6843 | 1.4128 | 8200 | -3.1059 | -3.1002 | -60.3531 | -65.6035 | 0.6895 | 0.5834 | -0.0164 | 0.0078 | -0.0242 |
0.6821 | 1.4300 | 8300 | -3.1054 | -3.0997 | -60.3991 | -65.6808 | 0.6894 | 0.5846 | -0.0169 | 0.0081 | -0.0250 |
0.6844 | 1.4473 | 8400 | -3.1046 | -3.0989 | -60.4732 | -65.7510 | 0.6894 | 0.5890 | -0.0176 | 0.0081 | -0.0257 |
0.6817 | 1.4645 | 8500 | -3.1035 | -3.0979 | -60.5483 | -65.8390 | 0.6894 | 0.5820 | -0.0184 | 0.0082 | -0.0266 |
0.6845 | 1.4817 | 8600 | -3.1020 | -3.0962 | -60.6134 | -65.9166 | 0.6893 | 0.5911 | -0.0190 | 0.0084 | -0.0274 |
0.6817 | 1.4990 | 8700 | -3.1018 | -3.0962 | -60.6714 | -65.9962 | 0.6892 | 0.5876 | -0.0196 | 0.0086 | -0.0282 |
0.6847 | 1.5162 | 8800 | -3.1011 | -3.0955 | -60.7533 | -66.0791 | 0.6892 | 0.5895 | -0.0204 | 0.0086 | -0.0290 |
0.6823 | 1.5334 | 8900 | -3.1000 | -3.0943 | -60.8401 | -66.1949 | 0.6891 | 0.5899 | -0.0213 | 0.0089 | -0.0301 |
0.6758 | 1.5507 | 9000 | -3.0991 | -3.0934 | -60.8754 | -66.2333 | 0.6891 | 0.5911 | -0.0216 | 0.0089 | -0.0305 |
0.6809 | 1.5679 | 9100 | -3.0981 | -3.0923 | -60.9570 | -66.3274 | 0.6890 | 0.5881 | -0.0225 | 0.0090 | -0.0315 |
0.6818 | 1.5851 | 9200 | -3.0977 | -3.0919 | -61.0578 | -66.4371 | 0.6890 | 0.5874 | -0.0235 | 0.0091 | -0.0326 |
0.6832 | 1.6023 | 9300 | -3.0969 | -3.0911 | -61.0783 | -66.4729 | 0.6889 | 0.5860 | -0.0237 | 0.0093 | -0.0329 |
0.6855 | 1.6196 | 9400 | -3.0955 | -3.0898 | -61.1367 | -66.5481 | 0.6888 | 0.5948 | -0.0242 | 0.0094 | -0.0337 |
0.6839 | 1.6368 | 9500 | -3.0946 | -3.0889 | -61.1814 | -66.5979 | 0.6888 | 0.5925 | -0.0247 | 0.0095 | -0.0342 |
0.682 | 1.6540 | 9600 | -3.0937 | -3.0880 | -61.2442 | -66.6790 | 0.6887 | 0.5918 | -0.0253 | 0.0097 | -0.0350 |
0.6826 | 1.6713 | 9700 | -3.0929 | -3.0872 | -61.2947 | -66.7452 | 0.6886 | 0.5964 | -0.0258 | 0.0098 | -0.0357 |
0.678 | 1.6885 | 9800 | -3.0924 | -3.0867 | -61.3594 | -66.8017 | 0.6887 | 0.5911 | -0.0265 | 0.0097 | -0.0362 |
0.6778 | 1.7057 | 9900 | -3.0915 | -3.0858 | -61.4006 | -66.8464 | 0.6887 | 0.5980 | -0.0269 | 0.0098 | -0.0367 |
0.6802 | 1.7229 | 10000 | -3.0913 | -3.0856 | -61.4128 | -66.8861 | 0.6885 | 0.5950 | -0.0270 | 0.0101 | -0.0371 |
0.6794 | 1.7402 | 10100 | -3.0902 | -3.0845 | -61.4624 | -66.9483 | 0.6885 | 0.5904 | -0.0275 | 0.0102 | -0.0377 |
0.6826 | 1.7574 | 10200 | -3.0891 | -3.0834 | -61.5260 | -67.0267 | 0.6884 | 0.5943 | -0.0281 | 0.0103 | -0.0385 |
0.6855 | 1.7746 | 10300 | -3.0885 | -3.0828 | -61.5950 | -67.1042 | 0.6884 | 0.5943 | -0.0288 | 0.0104 | -0.0392 |
0.6811 | 1.7919 | 10400 | -3.0877 | -3.0820 | -61.6404 | -67.1605 | 0.6883 | 0.5960 | -0.0293 | 0.0105 | -0.0398 |
0.6821 | 1.8091 | 10500 | -3.0870 | -3.0813 | -61.7017 | -67.2424 | 0.6882 | 0.5950 | -0.0299 | 0.0107 | -0.0406 |
0.6848 | 1.8263 | 10600 | -3.0865 | -3.0808 | -61.7690 | -67.3148 | 0.6882 | 0.5953 | -0.0306 | 0.0108 | -0.0413 |
0.6784 | 1.8436 | 10700 | -3.0858 | -3.0801 | -61.8288 | -67.3826 | 0.6882 | 0.5948 | -0.0312 | 0.0109 | -0.0420 |
0.6802 | 1.8608 | 10800 | -3.0852 | -3.0795 | -61.8876 | -67.4650 | 0.6881 | 0.5964 | -0.0318 | 0.0111 | -0.0428 |
0.6788 | 1.8780 | 10900 | -3.0848 | -3.0790 | -61.9505 | -67.5242 | 0.6881 | 0.5929 | -0.0324 | 0.0111 | -0.0434 |
0.6883 | 1.8952 | 11000 | -3.0842 | -3.0785 | -61.9829 | -67.5779 | 0.6880 | 0.5892 | -0.0327 | 0.0113 | -0.0440 |
0.6814 | 1.9125 | 11100 | -3.0834 | -3.0777 | -62.0617 | -67.6717 | 0.6879 | 0.5941 | -0.0335 | 0.0114 | -0.0449 |
0.6796 | 1.9297 | 11200 | -3.0830 | -3.0773 | -62.1302 | -67.7374 | 0.6880 | 0.5950 | -0.0342 | 0.0114 | -0.0456 |
0.6839 | 1.9469 | 11300 | -3.0824 | -3.0767 | -62.1736 | -67.8008 | 0.6879 | 0.5899 | -0.0346 | 0.0116 | -0.0462 |
0.6825 | 1.9642 | 11400 | -3.0815 | -3.0758 | -62.2222 | -67.8551 | 0.6878 | 0.5892 | -0.0351 | 0.0116 | -0.0467 |
0.6855 | 1.9814 | 11500 | -3.0812 | -3.0755 | -62.2490 | -67.8931 | 0.6878 | 0.5920 | -0.0354 | 0.0118 | -0.0471 |
0.6827 | 1.9986 | 11600 | -3.0808 | -3.0750 | -62.3037 | -67.9551 | 0.6878 | 0.5932 | -0.0359 | 0.0118 | -0.0477 |
0.6771 | 2.0159 | 11700 | -3.0803 | -3.0746 | -62.3479 | -68.0038 | 0.6877 | 0.6004 | -0.0364 | 0.0119 | -0.0482 |
0.6814 | 2.0331 | 11800 | -3.0795 | -3.0737 | -62.3687 | -68.0385 | 0.6877 | 0.5929 | -0.0366 | 0.0120 | -0.0486 |
0.679 | 2.0503 | 11900 | -3.0788 | -3.0731 | -62.4235 | -68.0889 | 0.6877 | 0.5964 | -0.0371 | 0.0120 | -0.0491 |
0.6823 | 2.0675 | 12000 | -3.0788 | -3.0731 | -62.4228 | -68.1068 | 0.6876 | 0.5980 | -0.0371 | 0.0122 | -0.0493 |
0.6811 | 2.0848 | 12100 | -3.0785 | -3.0728 | -62.4527 | -68.1324 | 0.6876 | 0.5922 | -0.0374 | 0.0121 | -0.0495 |
0.6774 | 2.1020 | 12200 | -3.0782 | -3.0725 | -62.4816 | -68.1725 | 0.6876 | 0.5915 | -0.0377 | 0.0122 | -0.0499 |
0.6787 | 2.1192 | 12300 | -3.0778 | -3.0721 | -62.5033 | -68.1982 | 0.6876 | 0.5848 | -0.0379 | 0.0123 | -0.0502 |
0.6816 | 2.1365 | 12400 | -3.0775 | -3.0718 | -62.5353 | -68.2405 | 0.6875 | 0.5953 | -0.0382 | 0.0124 | -0.0506 |
0.6767 | 2.1537 | 12500 | -3.0770 | -3.0712 | -62.5716 | -68.2769 | 0.6875 | 0.5922 | -0.0386 | 0.0124 | -0.0510 |
0.6797 | 2.1709 | 12600 | -3.0766 | -3.0709 | -62.6046 | -68.3137 | 0.6875 | 0.5974 | -0.0389 | 0.0124 | -0.0513 |
0.6777 | 2.1881 | 12700 | -3.0762 | -3.0705 | -62.6237 | -68.3343 | 0.6875 | 0.5936 | -0.0391 | 0.0124 | -0.0515 |
0.6832 | 2.2054 | 12800 | -3.0758 | -3.0701 | -62.6200 | -68.3353 | 0.6875 | 0.5939 | -0.0391 | 0.0125 | -0.0516 |
0.6782 | 2.2226 | 12900 | -3.0755 | -3.0698 | -62.6173 | -68.3392 | 0.6875 | 0.5922 | -0.0391 | 0.0125 | -0.0516 |
0.6834 | 2.2398 | 13000 | -3.0752 | -3.0695 | -62.6487 | -68.3802 | 0.6874 | 0.5946 | -0.0394 | 0.0126 | -0.0520 |
0.6793 | 2.2571 | 13100 | -3.0749 | -3.0692 | -62.6767 | -68.4021 | 0.6874 | 0.5925 | -0.0396 | 0.0126 | -0.0522 |
0.679 | 2.2743 | 13200 | -3.0748 | -3.0691 | -62.6739 | -68.3968 | 0.6875 | 0.5913 | -0.0396 | 0.0125 | -0.0522 |
0.685 | 2.2915 | 13300 | -3.0744 | -3.0687 | -62.6910 | -68.4248 | 0.6874 | 0.5915 | -0.0398 | 0.0127 | -0.0524 |
0.6766 | 2.3088 | 13400 | -3.0740 | -3.0684 | -62.7184 | -68.4492 | 0.6874 | 0.5922 | -0.0401 | 0.0126 | -0.0527 |
0.6747 | 2.3260 | 13500 | -3.0737 | -3.0680 | -62.7332 | -68.4680 | 0.6874 | 0.5913 | -0.0402 | 0.0127 | -0.0529 |
0.6812 | 2.3432 | 13600 | -3.0732 | -3.0675 | -62.7365 | -68.4875 | 0.6873 | 0.5927 | -0.0402 | 0.0128 | -0.0531 |
0.6801 | 2.3604 | 13700 | -3.0732 | -3.0675 | -62.7692 | -68.5067 | 0.6874 | 0.5911 | -0.0406 | 0.0127 | -0.0533 |
0.6809 | 2.3777 | 13800 | -3.0728 | -3.0671 | -62.7696 | -68.5208 | 0.6873 | 0.5946 | -0.0406 | 0.0128 | -0.0534 |
0.6816 | 2.3949 | 13900 | -3.0730 | -3.0674 | -62.7933 | -68.5354 | 0.6874 | 0.5953 | -0.0408 | 0.0127 | -0.0536 |
0.6821 | 2.4121 | 14000 | -3.0728 | -3.0670 | -62.8191 | -68.5660 | 0.6873 | 0.5897 | -0.0411 | 0.0128 | -0.0539 |
0.6775 | 2.4294 | 14100 | -3.0726 | -3.0669 | -62.8322 | -68.5844 | 0.6873 | 0.5946 | -0.0412 | 0.0128 | -0.0540 |
0.6787 | 2.4466 | 14200 | -3.0723 | -3.0666 | -62.8444 | -68.6121 | 0.6873 | 0.5976 | -0.0413 | 0.0130 | -0.0543 |
0.6777 | 2.4638 | 14300 | -3.0725 | -3.0667 | -62.8538 | -68.6067 | 0.6873 | 0.5927 | -0.0414 | 0.0128 | -0.0543 |
0.6785 | 2.4810 | 14400 | -3.0720 | -3.0663 | -62.8758 | -68.6288 | 0.6873 | 0.5922 | -0.0416 | 0.0128 | -0.0545 |
0.6723 | 2.4983 | 14500 | -3.0719 | -3.0662 | -62.8855 | -68.6587 | 0.6872 | 0.5927 | -0.0417 | 0.0130 | -0.0548 |
0.6776 | 2.5155 | 14600 | -3.0718 | -3.0661 | -62.8962 | -68.6640 | 0.6873 | 0.5936 | -0.0418 | 0.0130 | -0.0548 |
0.677 | 2.5327 | 14700 | -3.0717 | -3.0660 | -62.8845 | -68.6537 | 0.6872 | 0.5915 | -0.0417 | 0.0130 | -0.0547 |
0.6804 | 2.5500 | 14800 | -3.0713 | -3.0656 | -62.9039 | -68.6914 | 0.6872 | 0.5925 | -0.0419 | 0.0132 | -0.0551 |
0.6793 | 2.5672 | 14900 | -3.0715 | -3.0658 | -62.9045 | -68.6895 | 0.6872 | 0.5978 | -0.0419 | 0.0132 | -0.0551 |
0.6836 | 2.5844 | 15000 | -3.0714 | -3.0657 | -62.9167 | -68.6898 | 0.6872 | 0.5883 | -0.0420 | 0.0130 | -0.0551 |
0.6829 | 2.6017 | 15100 | -3.0713 | -3.0656 | -62.9200 | -68.7007 | 0.6872 | 0.5929 | -0.0421 | 0.0131 | -0.0552 |
0.6745 | 2.6189 | 15200 | -3.0708 | -3.0651 | -62.9332 | -68.7246 | 0.6871 | 0.5939 | -0.0422 | 0.0132 | -0.0554 |
0.6792 | 2.6361 | 15300 | -3.0714 | -3.0657 | -62.9299 | -68.7169 | 0.6872 | 0.5925 | -0.0422 | 0.0132 | -0.0554 |
0.6756 | 2.6533 | 15400 | -3.0711 | -3.0654 | -62.9311 | -68.7170 | 0.6872 | 0.5899 | -0.0422 | 0.0132 | -0.0554 |
0.6876 | 2.6706 | 15500 | -3.0710 | -3.0653 | -62.9352 | -68.7054 | 0.6872 | 0.5939 | -0.0422 | 0.0130 | -0.0553 |
0.6793 | 2.6878 | 15600 | -3.0708 | -3.0651 | -62.9398 | -68.7274 | 0.6872 | 0.5939 | -0.0423 | 0.0132 | -0.0555 |
0.6814 | 2.7050 | 15700 | -3.0709 | -3.0652 | -62.9370 | -68.7273 | 0.6871 | 0.5901 | -0.0423 | 0.0132 | -0.0555 |
0.6771 | 2.7223 | 15800 | -3.0711 | -3.0654 | -62.9376 | -68.7270 | 0.6872 | 0.5950 | -0.0423 | 0.0132 | -0.0555 |
0.6732 | 2.7395 | 15900 | -3.0708 | -3.0651 | -62.9390 | -68.7115 | 0.6872 | 0.5929 | -0.0423 | 0.0130 | -0.0553 |
0.6784 | 2.7567 | 16000 | -3.0706 | -3.0649 | -62.9350 | -68.7192 | 0.6872 | 0.5925 | -0.0422 | 0.0132 | -0.0554 |
0.6783 | 2.7739 | 16100 | -3.0708 | -3.0651 | -62.9475 | -68.7257 | 0.6872 | 0.5876 | -0.0424 | 0.0131 | -0.0555 |
0.6733 | 2.7912 | 16200 | -3.0709 | -3.0652 | -62.9490 | -68.7286 | 0.6872 | 0.5899 | -0.0424 | 0.0131 | -0.0555 |
0.6816 | 2.8084 | 16300 | -3.0708 | -3.0651 | -62.9483 | -68.7357 | 0.6872 | 0.5997 | -0.0424 | 0.0132 | -0.0556 |
0.6797 | 2.8256 | 16400 | -3.0709 | -3.0652 | -62.9484 | -68.7255 | 0.6872 | 0.5911 | -0.0424 | 0.0131 | -0.0555 |
0.6758 | 2.8429 | 16500 | -3.0711 | -3.0654 | -62.9445 | -68.7217 | 0.6872 | 0.5915 | -0.0423 | 0.0131 | -0.0554 |
0.6868 | 2.8601 | 16600 | -3.0709 | -3.0652 | -62.9350 | -68.7219 | 0.6872 | 0.5920 | -0.0422 | 0.0132 | -0.0554 |
0.6821 | 2.8773 | 16700 | -3.0711 | -3.0654 | -62.9312 | -68.7231 | 0.6871 | 0.5915 | -0.0422 | 0.0132 | -0.0554 |
0.6862 | 2.8946 | 16800 | -3.0705 | -3.0648 | -62.9449 | -68.7250 | 0.6872 | 0.5929 | -0.0423 | 0.0131 | -0.0554 |
0.6855 | 2.9118 | 16900 | -3.0710 | -3.0653 | -62.9327 | -68.7176 | 0.6872 | 0.5908 | -0.0422 | 0.0132 | -0.0554 |
0.6815 | 2.9290 | 17000 | -3.0706 | -3.0650 | -62.9358 | -68.7221 | 0.6872 | 0.5906 | -0.0422 | 0.0132 | -0.0554 |
0.6795 | 2.9462 | 17100 | -3.0706 | -3.0649 | -62.9485 | -68.7265 | 0.6872 | 0.5929 | -0.0424 | 0.0131 | -0.0555 |
0.6849 | 2.9635 | 17200 | -3.0705 | -3.0649 | -62.9499 | -68.7367 | 0.6872 | 0.5925 | -0.0424 | 0.0132 | -0.0556 |
0.6857 | 2.9807 | 17300 | -3.0704 | -3.0648 | -62.9481 | -68.7256 | 0.6872 | 0.5934 | -0.0424 | 0.0131 | -0.0555 |
0.6745 | 2.9979 | 17400 | -3.0704 | -3.0648 | -62.9434 | -68.7246 | 0.6872 | 0.5943 | -0.0423 | 0.0131 | -0.0554 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 14
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.