Edit model card

tinyllama-1.1b-sum-dpo-full

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6549
  • Rewards/chosen: -0.4976
  • Rewards/rejected: -0.6010
  • Rewards/accuracies: 0.6194
  • Rewards/margins: 0.1035
  • Logps/rejected: -123.2810
  • Logps/chosen: -108.4673
  • Logits/rejected: -2.5516
  • Logits/chosen: -2.5584

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6932 0.0172 100 0.6932 0.0000 0.0001 0.4819 -0.0001 -63.1720 -58.7099 -3.1572 -3.1629
0.6931 0.0345 200 0.6932 0.0000 0.0001 0.4893 -0.0001 -63.1716 -58.7118 -3.1576 -3.1632
0.6932 0.0517 300 0.6932 0.0000 0.0001 0.4696 -0.0001 -63.1677 -58.7096 -3.1575 -3.1631
0.6933 0.0689 400 0.6932 0.0002 0.0002 0.4844 -0.0000 -63.1572 -58.6929 -3.1574 -3.1631
0.6931 0.0861 500 0.6931 0.0002 0.0002 0.5016 0.0000 -63.1582 -58.6892 -3.1571 -3.1628
0.6925 0.1034 600 0.6931 0.0004 0.0003 0.5158 0.0002 -63.1507 -58.6671 -3.1566 -3.1623
0.6927 0.1206 700 0.6931 0.0006 0.0004 0.5276 0.0002 -63.1420 -58.6550 -3.1556 -3.1612
0.6924 0.1378 800 0.6929 0.0010 0.0006 0.5509 0.0005 -63.1244 -58.6089 -3.1546 -3.1601
0.692 0.1551 900 0.6928 0.0014 0.0007 0.5534 0.0007 -63.1085 -58.5690 -3.1524 -3.1580
0.6924 0.1723 1000 0.6926 0.0018 0.0007 0.5660 0.0011 -63.1097 -58.5334 -3.1494 -3.1550
0.6913 0.1895 1100 0.6924 0.0021 0.0005 0.5737 0.0016 -63.1303 -58.5028 -3.1458 -3.1514
0.6912 0.2068 1200 0.6921 0.0022 0.0001 0.5795 0.0021 -63.1677 -58.4881 -3.1407 -3.1464
0.6911 0.2240 1300 0.6918 0.0017 -0.0011 0.5901 0.0028 -63.2892 -58.5372 -3.1358 -3.1414
0.6871 0.2412 1400 0.6914 0.0006 -0.0031 0.5785 0.0037 -63.4895 -58.6491 -3.1300 -3.1356
0.6866 0.2584 1500 0.6910 -0.0015 -0.0061 0.5750 0.0045 -63.7853 -58.8661 -3.1246 -3.1303
0.6876 0.2757 1600 0.6907 -0.0038 -0.0091 0.5874 0.0053 -64.0863 -59.0928 -3.1185 -3.1241
0.6882 0.2929 1700 0.6903 -0.0067 -0.0126 0.5850 0.0060 -64.4449 -59.3800 -3.1117 -3.1173
0.6838 0.3101 1800 0.6900 -0.0121 -0.0190 0.5825 0.0069 -65.0772 -59.9201 -3.1038 -3.1095
0.6836 0.3274 1900 0.6895 -0.0157 -0.0235 0.5883 0.0078 -65.5277 -60.2801 -3.0980 -3.1037
0.685 0.3446 2000 0.6889 -0.0227 -0.0319 0.5897 0.0092 -66.3702 -60.9847 -3.0905 -3.0962
0.6828 0.3618 2100 0.6883 -0.0311 -0.0418 0.5806 0.0107 -67.3595 -61.8209 -3.0840 -3.0897
0.6745 0.3790 2200 0.6876 -0.0382 -0.0504 0.5883 0.0123 -68.2227 -62.5273 -3.0753 -3.0811
0.6781 0.3963 2300 0.6872 -0.0405 -0.0537 0.5908 0.0131 -68.5468 -62.7638 -3.0689 -3.0745
0.6809 0.4135 2400 0.6866 -0.0471 -0.0615 0.5906 0.0144 -69.3305 -63.4208 -3.0592 -3.0649
0.6828 0.4307 2500 0.6862 -0.0557 -0.0713 0.5913 0.0156 -70.3087 -64.2813 -3.0501 -3.0558
0.6754 0.4480 2600 0.6856 -0.0615 -0.0783 0.5918 0.0168 -71.0083 -64.8584 -3.0433 -3.0490
0.6768 0.4652 2700 0.6851 -0.0674 -0.0853 0.5957 0.0180 -71.7136 -65.4475 -3.0370 -3.0427
0.6766 0.4824 2800 0.6846 -0.0727 -0.0919 0.5967 0.0192 -72.3669 -65.9771 -3.0308 -3.0365
0.6769 0.4997 2900 0.6843 -0.0755 -0.0954 0.6004 0.0199 -72.7197 -66.2619 -3.0232 -3.0289
0.6781 0.5169 3000 0.6839 -0.0812 -0.1022 0.6027 0.0210 -73.3995 -66.8329 -3.0144 -3.0201
0.67 0.5341 3100 0.6835 -0.0822 -0.1040 0.6004 0.0218 -73.5753 -66.9287 -3.0095 -3.0153
0.6718 0.5513 3200 0.6828 -0.0939 -0.1173 0.6015 0.0235 -74.9148 -68.1005 -2.9982 -3.0040
0.6724 0.5686 3300 0.6822 -0.0999 -0.1249 0.6050 0.0250 -75.6694 -68.7027 -2.9851 -2.9908
0.6625 0.5858 3400 0.6818 -0.1009 -0.1266 0.6090 0.0257 -75.8440 -68.8060 -2.9762 -2.9820
0.6742 0.6030 3500 0.6814 -0.1071 -0.1338 0.6083 0.0267 -76.5617 -69.4202 -2.9687 -2.9745
0.6722 0.6203 3600 0.6810 -0.1126 -0.1404 0.6099 0.0277 -77.2155 -69.9734 -2.9597 -2.9655
0.664 0.6375 3700 0.6803 -0.1209 -0.1502 0.6090 0.0293 -78.2040 -70.8018 -2.9485 -2.9543
0.6644 0.6547 3800 0.6795 -0.1327 -0.1641 0.6111 0.0314 -79.5918 -71.9851 -2.9386 -2.9444
0.6664 0.6720 3900 0.6786 -0.1449 -0.1784 0.6080 0.0335 -81.0222 -73.2044 -2.9300 -2.9358
0.6653 0.6892 4000 0.6781 -0.1559 -0.1909 0.6057 0.0350 -82.2692 -74.3040 -2.9178 -2.9236
0.6532 0.7064 4100 0.6776 -0.1612 -0.1975 0.6125 0.0363 -82.9296 -74.8363 -2.9005 -2.9064
0.6733 0.7236 4200 0.6769 -0.1720 -0.2098 0.6087 0.0378 -84.1639 -75.9119 -2.8890 -2.8949
0.6618 0.7409 4300 0.6764 -0.1798 -0.2189 0.6057 0.0391 -85.0723 -76.6940 -2.8794 -2.8853
0.6625 0.7581 4400 0.6757 -0.1936 -0.2347 0.6053 0.0411 -86.6464 -78.0713 -2.8686 -2.8745
0.6605 0.7753 4500 0.6746 -0.2097 -0.2535 0.6066 0.0439 -88.5342 -79.6776 -2.8590 -2.8649
0.6437 0.7926 4600 0.6737 -0.2242 -0.2703 0.6071 0.0461 -90.2150 -81.1344 -2.8513 -2.8573
0.6526 0.8098 4700 0.6727 -0.2385 -0.2872 0.6069 0.0487 -91.9046 -82.5646 -2.8429 -2.8489
0.6604 0.8270 4800 0.6721 -0.2495 -0.2999 0.6090 0.0504 -93.1696 -83.6594 -2.8351 -2.8410
0.6664 0.8442 4900 0.6712 -0.2621 -0.3148 0.6048 0.0526 -94.6595 -84.9266 -2.8264 -2.8324
0.6499 0.8615 5000 0.6707 -0.2706 -0.3247 0.5955 0.0541 -95.6483 -85.7703 -2.8111 -2.8172
0.6628 0.8787 5100 0.6697 -0.2843 -0.3411 0.5969 0.0568 -97.2923 -87.1431 -2.8035 -2.8094
0.6513 0.8959 5200 0.6693 -0.2867 -0.3444 0.5953 0.0577 -97.6222 -87.3824 -2.7972 -2.8031
0.6475 0.9132 5300 0.6692 -0.2901 -0.3484 0.5987 0.0583 -98.0213 -87.7248 -2.7882 -2.7943
0.6494 0.9304 5400 0.6687 -0.2940 -0.3536 0.6015 0.0596 -98.5368 -88.1090 -2.7827 -2.7887
0.6412 0.9476 5500 0.6682 -0.3024 -0.3635 0.5997 0.0610 -99.5251 -88.9533 -2.7734 -2.7794
0.6531 0.9649 5600 0.6680 -0.2995 -0.3610 0.6046 0.0615 -99.2758 -88.6585 -2.7683 -2.7743
0.652 0.9821 5700 0.6671 -0.3121 -0.3760 0.6041 0.0639 -100.7801 -89.9234 -2.7604 -2.7664
0.6355 0.9993 5800 0.6663 -0.3272 -0.3936 0.6057 0.0664 -102.5409 -91.4366 -2.7489 -2.7549
0.6362 1.0165 5900 0.6654 -0.3504 -0.4199 0.6043 0.0695 -105.1658 -93.7475 -2.7329 -2.7390
0.6587 1.0338 6000 0.6654 -0.3453 -0.4145 0.6076 0.0692 -104.6326 -93.2431 -2.7260 -2.7321
0.6337 1.0510 6100 0.6649 -0.3492 -0.4197 0.6078 0.0705 -105.1470 -93.6331 -2.7177 -2.7237
0.6372 1.0682 6200 0.6640 -0.3675 -0.4408 0.6090 0.0734 -107.2651 -95.4612 -2.7083 -2.7144
0.6555 1.0855 6300 0.6633 -0.3808 -0.4563 0.6111 0.0755 -108.8140 -96.7948 -2.7009 -2.7071
0.6406 1.1027 6400 0.6629 -0.3843 -0.4611 0.6108 0.0768 -109.2905 -97.1394 -2.6941 -2.7003
0.6445 1.1199 6500 0.6626 -0.3894 -0.4670 0.6097 0.0776 -109.8768 -97.6507 -2.6860 -2.6923
0.6438 1.1371 6600 0.6627 -0.3907 -0.4683 0.6073 0.0776 -110.0129 -97.7839 -2.6814 -2.6877
0.6411 1.1544 6700 0.6622 -0.3996 -0.4791 0.6122 0.0795 -111.0866 -98.6695 -2.6729 -2.6791
0.6224 1.1716 6800 0.6614 -0.4163 -0.4982 0.6115 0.0819 -112.9988 -100.3370 -2.6625 -2.6688
0.6437 1.1888 6900 0.6610 -0.4232 -0.5064 0.6106 0.0832 -113.8220 -101.0292 -2.6554 -2.6618
0.6268 1.2061 7000 0.6604 -0.4419 -0.5278 0.6090 0.0859 -115.9616 -102.9045 -2.6490 -2.6553
0.6303 1.2233 7100 0.6604 -0.4379 -0.5238 0.6129 0.0859 -115.5604 -102.5041 -2.6443 -2.6506
0.6251 1.2405 7200 0.6600 -0.4437 -0.5309 0.6101 0.0872 -116.2726 -103.0814 -2.6383 -2.6448
0.6531 1.2578 7300 0.6602 -0.4339 -0.5202 0.6125 0.0863 -115.1998 -102.0999 -2.6366 -2.6430
0.6456 1.2750 7400 0.6600 -0.4313 -0.5180 0.6125 0.0867 -114.9813 -101.8414 -2.6345 -2.6409
0.6455 1.2922 7500 0.6597 -0.4307 -0.5180 0.6148 0.0873 -114.9807 -101.7862 -2.6292 -2.6357
0.6762 1.3094 7600 0.6593 -0.4392 -0.5278 0.6118 0.0887 -115.9649 -102.6288 -2.6216 -2.6281
0.6365 1.3267 7700 0.6592 -0.4402 -0.5295 0.6157 0.0893 -116.1288 -102.7343 -2.6172 -2.6237
0.6211 1.3439 7800 0.6588 -0.4484 -0.5389 0.6194 0.0906 -117.0741 -103.5481 -2.6115 -2.6180
0.641 1.3611 7900 0.6581 -0.4553 -0.5479 0.6217 0.0926 -117.9735 -104.2409 -2.6077 -2.6143
0.6228 1.3784 8000 0.6578 -0.4583 -0.5520 0.6215 0.0937 -118.3795 -104.5455 -2.6043 -2.6109
0.641 1.3956 8100 0.6579 -0.4658 -0.5596 0.6178 0.0939 -119.1444 -105.2910 -2.5997 -2.6063
0.6504 1.4128 8200 0.6571 -0.4707 -0.5666 0.6213 0.0959 -119.8413 -105.7863 -2.5974 -2.6040
0.6472 1.4300 8300 0.6573 -0.4661 -0.5612 0.6217 0.0951 -119.3045 -105.3220 -2.5953 -2.6018
0.6298 1.4473 8400 0.6573 -0.4609 -0.5560 0.6206 0.0950 -118.7768 -104.8056 -2.5928 -2.5994
0.6207 1.4645 8500 0.6573 -0.4579 -0.5531 0.6180 0.0952 -118.4887 -104.5014 -2.5885 -2.5952
0.6661 1.4817 8600 0.6571 -0.4639 -0.5598 0.6204 0.0959 -119.1632 -105.1053 -2.5846 -2.5913
0.6475 1.4990 8700 0.6572 -0.4570 -0.5525 0.6190 0.0954 -118.4251 -104.4133 -2.5846 -2.5912
0.6476 1.5162 8800 0.6569 -0.4604 -0.5566 0.6194 0.0962 -118.8439 -104.7545 -2.5816 -2.5883
0.6336 1.5334 8900 0.6568 -0.4692 -0.5663 0.6190 0.0971 -119.8081 -105.6329 -2.5772 -2.5839
0.6282 1.5507 9000 0.6564 -0.4708 -0.5690 0.6187 0.0981 -120.0761 -105.7962 -2.5754 -2.5821
0.646 1.5679 9100 0.6565 -0.4724 -0.5704 0.6187 0.0980 -120.2213 -105.9529 -2.5732 -2.5799
0.6225 1.5851 9200 0.6563 -0.4762 -0.5749 0.6190 0.0987 -120.6733 -106.3303 -2.5714 -2.5781
0.6223 1.6023 9300 0.6562 -0.4763 -0.5753 0.6180 0.0990 -120.7107 -106.3383 -2.5692 -2.5759
0.6288 1.6196 9400 0.6559 -0.4818 -0.5819 0.6201 0.1001 -121.3710 -106.8921 -2.5664 -2.5731
0.6223 1.6368 9500 0.6557 -0.4823 -0.5828 0.6176 0.1005 -121.4601 -106.9374 -2.5650 -2.5717
0.6363 1.6540 9600 0.6556 -0.4891 -0.5902 0.6197 0.1011 -122.2042 -107.6243 -2.5615 -2.5683
0.6355 1.6713 9700 0.6556 -0.4880 -0.5892 0.6211 0.1012 -122.1034 -107.5130 -2.5609 -2.5677
0.6247 1.6885 9800 0.6555 -0.4894 -0.5910 0.6201 0.1015 -122.2755 -107.6543 -2.5603 -2.5670
0.5826 1.7057 9900 0.6554 -0.4911 -0.5929 0.6206 0.1019 -122.4715 -107.8182 -2.5591 -2.5659
0.6181 1.7229 10000 0.6553 -0.4923 -0.5945 0.6204 0.1022 -122.6296 -107.9373 -2.5579 -2.5647
0.6365 1.7402 10100 0.6553 -0.4917 -0.5938 0.6201 0.1022 -122.5635 -107.8778 -2.5567 -2.5635
0.6269 1.7574 10200 0.6552 -0.4952 -0.5977 0.6208 0.1025 -122.9497 -108.2321 -2.5556 -2.5624
0.6573 1.7746 10300 0.6553 -0.4962 -0.5988 0.6201 0.1026 -123.0645 -108.3347 -2.5542 -2.5610
0.6036 1.7919 10400 0.6552 -0.4953 -0.5980 0.6197 0.1027 -122.9784 -108.2400 -2.5542 -2.5610
0.6178 1.8091 10500 0.6549 -0.4956 -0.5990 0.6213 0.1034 -123.0831 -108.2757 -2.5531 -2.5598
0.6403 1.8263 10600 0.6551 -0.4967 -0.5996 0.6204 0.1030 -123.1450 -108.3809 -2.5527 -2.5594
0.6341 1.8436 10700 0.6550 -0.4965 -0.5997 0.6206 0.1032 -123.1496 -108.3595 -2.5523 -2.5590
0.627 1.8608 10800 0.6549 -0.4971 -0.6006 0.6211 0.1035 -123.2409 -108.4216 -2.5521 -2.5589
0.6335 1.8780 10900 0.6550 -0.4974 -0.6009 0.6201 0.1035 -123.2728 -108.4564 -2.5523 -2.5590
0.6262 1.8952 11000 0.6550 -0.4971 -0.6003 0.6201 0.1033 -123.2126 -108.4185 -2.5520 -2.5588
0.6311 1.9125 11100 0.6548 -0.4971 -0.6009 0.6211 0.1038 -123.2688 -108.4253 -2.5521 -2.5589
0.6239 1.9297 11200 0.6551 -0.4971 -0.6003 0.6201 0.1031 -123.2061 -108.4263 -2.5516 -2.5583
0.6629 1.9469 11300 0.6550 -0.4970 -0.6003 0.6206 0.1033 -123.2066 -108.4107 -2.5518 -2.5587
0.6308 1.9642 11400 0.6550 -0.4972 -0.6005 0.6197 0.1033 -123.2305 -108.4360 -2.5518 -2.5586
0.6532 1.9814 11500 0.6550 -0.4972 -0.6005 0.6197 0.1033 -123.2317 -108.4313 -2.5517 -2.5585
0.6257 1.9986 11600 0.6549 -0.4976 -0.6010 0.6194 0.1035 -123.2810 -108.4673 -2.5516 -2.5584

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
11
Safetensors
Model size
1.1B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for martimfasantos/tinyllama-1.1b-sum-dpo-full_LR1e-7_2epochs

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR1e-7_2epochs