--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1047 - Num Input Tokens Seen: 63448944 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 2 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.6616 | 0.0043 | 5 | 1.3941 | 269008 | | 1.5787 | 0.0085 | 10 | 1.3763 | 538784 | | 1.6061 | 0.0128 | 15 | 1.3402 | 805456 | | 1.542 | 0.0170 | 20 | 1.2816 | 1075056 | | 1.5257 | 0.0213 | 25 | 1.2400 | 1336536 | | 1.4703 | 0.0255 | 30 | 1.1987 | 1607712 | | 1.2258 | 0.0298 | 35 | 1.1785 | 1874888 | | 1.1104 | 0.0340 | 40 | 1.1862 | 2141664 | | 1.1191 | 0.0383 | 45 | 1.1893 | 2412176 | | 0.9131 | 0.0425 | 50 | 1.2149 | 2679368 | | 0.8898 | 0.0468 | 55 | 1.2700 | 2942232 | | 0.7874 | 0.0510 | 60 | 1.2706 | 3205952 | | 0.5907 | 0.0553 | 65 | 1.2847 | 3468440 | | 0.4503 | 0.0595 | 70 | 1.2806 | 3740128 | | 0.5092 | 0.0638 | 75 | 1.2788 | 4009264 | | 0.4167 | 0.0680 | 80 | 1.2665 | 4274760 | | 0.417 | 0.0723 | 85 | 1.2501 | 4546256 | | 0.3088 | 0.0765 | 90 | 1.2446 | 4808944 | | 0.4041 | 0.0808 | 95 | 1.2336 | 5076456 | | 0.2974 | 0.0850 | 100 | 1.2323 | 5347376 | | 0.2938 | 0.0893 | 105 | 1.2199 | 5621736 | | 0.287 | 0.0935 | 110 | 1.2339 | 5898632 | | 0.2297 | 0.0978 | 115 | 1.2183 | 6172408 | | 0.2524 | 0.1020 | 120 | 1.2174 | 6443328 | | 0.3736 | 0.1063 | 125 | 1.2051 | 6712584 | | 0.3085 | 0.1106 | 130 | 1.2169 | 6985816 | | 0.2702 | 0.1148 | 135 | 1.2060 | 7253504 | | 0.2927 | 0.1191 | 140 | 1.2086 | 7527360 | | 0.2535 | 0.1233 | 145 | 1.2002 | 7795952 | | 0.257 | 0.1276 | 150 | 1.1968 | 8063272 | | 0.1809 | 0.1318 | 155 | 1.2008 | 8341072 | | 0.2557 | 0.1361 | 160 | 1.1920 | 8613272 | | 0.164 | 0.1403 | 165 | 1.1975 | 8886320 | | 0.2581 | 0.1446 | 170 | 1.1859 | 9158168 | | 0.2019 | 0.1488 | 175 | 1.1917 | 9427592 | | 0.235 | 0.1531 | 180 | 1.1826 | 9707168 | | 0.2164 | 0.1573 | 185 | 1.1872 | 9978504 | | 0.2057 | 0.1616 | 190 | 1.1885 | 10253736 | | 0.1802 | 0.1658 | 195 | 1.1793 | 10524208 | | 0.1872 | 0.1701 | 200 | 1.1845 | 10790288 | | 0.2015 | 0.1743 | 205 | 1.1854 | 11053584 | | 0.2384 | 0.1786 | 210 | 1.1791 | 11331848 | | 0.2103 | 0.1828 | 215 | 1.1801 | 11605448 | | 0.2122 | 0.1871 | 220 | 1.1813 | 11879880 | | 0.1953 | 0.1913 | 225 | 1.1806 | 12151528 | | 0.1833 | 0.1956 | 230 | 1.1797 | 12428920 | | 0.1686 | 0.1998 | 235 | 1.1767 | 12699192 | | 0.1687 | 0.2041 | 240 | 1.1778 | 12970688 | | 0.2107 | 0.2083 | 245 | 1.1769 | 13234504 | | 0.2416 | 0.2126 | 250 | 1.1706 | 13505840 | | 0.2221 | 0.2168 | 255 | 1.1668 | 13773632 | | 0.1691 | 0.2211 | 260 | 1.1705 | 14051072 | | 0.1346 | 0.2254 | 265 | 1.1608 | 14321792 | | 0.177 | 0.2296 | 270 | 1.1656 | 14600320 | | 0.2298 | 0.2339 | 275 | 1.1672 | 14873872 | | 0.1853 | 0.2381 | 280 | 1.1621 | 15147328 | | 0.2145 | 0.2424 | 285 | 1.1626 | 15417416 | | 0.1656 | 0.2466 | 290 | 1.1592 | 15689168 | | 0.2127 | 0.2509 | 295 | 1.1598 | 15955088 | | 0.1722 | 0.2551 | 300 | 1.1605 | 16222264 | | 0.2392 | 0.2594 | 305 | 1.1578 | 16489368 | | 0.1921 | 0.2636 | 310 | 1.1597 | 16761168 | | 0.1397 | 0.2679 | 315 | 1.1550 | 17029576 | | 0.138 | 0.2721 | 320 | 1.1551 | 17302520 | | 0.1682 | 0.2764 | 325 | 1.1585 | 17577312 | | 0.1804 | 0.2806 | 330 | 1.1512 | 17847720 | | 0.1808 | 0.2849 | 335 | 1.1523 | 18112320 | | 0.1563 | 0.2891 | 340 | 1.1576 | 18379416 | | 0.1718 | 0.2934 | 345 | 1.1521 | 18642848 | | 0.1676 | 0.2976 | 350 | 1.1514 | 18914928 | | 0.1584 | 0.3019 | 355 | 1.1501 | 19181856 | | 0.1449 | 0.3061 | 360 | 1.1490 | 19448128 | | 0.247 | 0.3104 | 365 | 1.1506 | 19721928 | | 0.1676 | 0.3146 | 370 | 1.1534 | 19995120 | | 0.1427 | 0.3189 | 375 | 1.1491 | 20263544 | | 0.1552 | 0.3231 | 380 | 1.1476 | 20525712 | | 0.1641 | 0.3274 | 385 | 1.1466 | 20793992 | | 0.1818 | 0.3317 | 390 | 1.1474 | 21062552 | | 0.1938 | 0.3359 | 395 | 1.1467 | 21327608 | | 0.1872 | 0.3402 | 400 | 1.1450 | 21602552 | | 0.238 | 0.3444 | 405 | 1.1490 | 21874224 | | 0.1042 | 0.3487 | 410 | 1.1430 | 22143792 | | 0.1036 | 0.3529 | 415 | 1.1442 | 22424408 | | 0.1606 | 0.3572 | 420 | 1.1444 | 22693520 | | 0.188 | 0.3614 | 425 | 1.1438 | 22962496 | | 0.1836 | 0.3657 | 430 | 1.1462 | 23234648 | | 0.1706 | 0.3699 | 435 | 1.1426 | 23506408 | | 0.1614 | 0.3742 | 440 | 1.1425 | 23777032 | | 0.1609 | 0.3784 | 445 | 1.1433 | 24050192 | | 0.116 | 0.3827 | 450 | 1.1430 | 24316000 | | 0.1864 | 0.3869 | 455 | 1.1425 | 24589336 | | 0.198 | 0.3912 | 460 | 1.1378 | 24861112 | | 0.1611 | 0.3954 | 465 | 1.1397 | 25136072 | | 0.1429 | 0.3997 | 470 | 1.1399 | 25403784 | | 0.1901 | 0.4039 | 475 | 1.1363 | 25670752 | | 0.2213 | 0.4082 | 480 | 1.1353 | 25945016 | | 0.1166 | 0.4124 | 485 | 1.1395 | 26220704 | | 0.1259 | 0.4167 | 490 | 1.1357 | 26484920 | | 0.2132 | 0.4209 | 495 | 1.1331 | 26752640 | | 0.1699 | 0.4252 | 500 | 1.1347 | 27018608 | | 0.0938 | 0.4294 | 505 | 1.1352 | 27286128 | | 0.1752 | 0.4337 | 510 | 1.1370 | 27562024 | | 0.1873 | 0.4379 | 515 | 1.1320 | 27830728 | | 0.1796 | 0.4422 | 520 | 1.1322 | 28103888 | | 0.1176 | 0.4465 | 525 | 1.1345 | 28371784 | | 0.0928 | 0.4507 | 530 | 1.1345 | 28642592 | | 0.1709 | 0.4550 | 535 | 1.1332 | 28903728 | | 0.1094 | 0.4592 | 540 | 1.1325 | 29178904 | | 0.1501 | 0.4635 | 545 | 1.1337 | 29448176 | | 0.1372 | 0.4677 | 550 | 1.1325 | 29717176 | | 0.1512 | 0.4720 | 555 | 1.1340 | 29984912 | | 0.1478 | 0.4762 | 560 | 1.1313 | 30258688 | | 0.1654 | 0.4805 | 565 | 1.1306 | 30521056 | | 0.165 | 0.4847 | 570 | 1.1319 | 30792392 | | 0.1263 | 0.4890 | 575 | 1.1324 | 31061264 | | 0.1196 | 0.4932 | 580 | 1.1299 | 31330912 | | 0.1268 | 0.4975 | 585 | 1.1305 | 31603912 | | 0.1234 | 0.5017 | 590 | 1.1312 | 31884080 | | 0.1143 | 0.5060 | 595 | 1.1285 | 32152232 | | 0.1784 | 0.5102 | 600 | 1.1281 | 32414424 | | 0.1548 | 0.5145 | 605 | 1.1310 | 32688920 | | 0.202 | 0.5187 | 610 | 1.1276 | 32959712 | | 0.2025 | 0.5230 | 615 | 1.1271 | 33233432 | | 0.2025 | 0.5272 | 620 | 1.1291 | 33504392 | | 0.1724 | 0.5315 | 625 | 1.1266 | 33777920 | | 0.1809 | 0.5357 | 630 | 1.1255 | 34045208 | | 0.2091 | 0.5400 | 635 | 1.1266 | 34316536 | | 0.1236 | 0.5442 | 640 | 1.1257 | 34588848 | | 0.2578 | 0.5485 | 645 | 1.1225 | 34861720 | | 0.1594 | 0.5528 | 650 | 1.1229 | 35137320 | | 0.0931 | 0.5570 | 655 | 1.1263 | 35408808 | | 0.1531 | 0.5613 | 660 | 1.1285 | 35680680 | | 0.1458 | 0.5655 | 665 | 1.1248 | 35946696 | | 0.1638 | 0.5698 | 670 | 1.1234 | 36213456 | | 0.0762 | 0.5740 | 675 | 1.1252 | 36478736 | | 0.1295 | 0.5783 | 680 | 1.1270 | 36751144 | | 0.1237 | 0.5825 | 685 | 1.1246 | 37020688 | | 0.1947 | 0.5868 | 690 | 1.1251 | 37290280 | | 0.185 | 0.5910 | 695 | 1.1239 | 37559352 | | 0.1981 | 0.5953 | 700 | 1.1241 | 37820632 | | 0.171 | 0.5995 | 705 | 1.1214 | 38095952 | | 0.1491 | 0.6038 | 710 | 1.1216 | 38355560 | | 0.0939 | 0.6080 | 715 | 1.1226 | 38631968 | | 0.0722 | 0.6123 | 720 | 1.1237 | 38901632 | | 0.1797 | 0.6165 | 725 | 1.1198 | 39171656 | | 0.1558 | 0.6208 | 730 | 1.1189 | 39443808 | | 0.2049 | 0.6250 | 735 | 1.1207 | 39714152 | | 0.1406 | 0.6293 | 740 | 1.1193 | 39986024 | | 0.1522 | 0.6335 | 745 | 1.1207 | 40259512 | | 0.0855 | 0.6378 | 750 | 1.1193 | 40528328 | | 0.1577 | 0.6420 | 755 | 1.1210 | 40806056 | | 0.1875 | 0.6463 | 760 | 1.1228 | 41080264 | | 0.1831 | 0.6505 | 765 | 1.1172 | 41347064 | | 0.1624 | 0.6548 | 770 | 1.1169 | 41627368 | | 0.1936 | 0.6590 | 775 | 1.1189 | 41895808 | | 0.1859 | 0.6633 | 780 | 1.1177 | 42171680 | | 0.1319 | 0.6676 | 785 | 1.1170 | 42446136 | | 0.1279 | 0.6718 | 790 | 1.1168 | 42718504 | | 0.1451 | 0.6761 | 795 | 1.1177 | 42992080 | | 0.1529 | 0.6803 | 800 | 1.1186 | 43262448 | | 0.1099 | 0.6846 | 805 | 1.1203 | 43529920 | | 0.1659 | 0.6888 | 810 | 1.1191 | 43797688 | | 0.1703 | 0.6931 | 815 | 1.1194 | 44075648 | | 0.1344 | 0.6973 | 820 | 1.1199 | 44341096 | | 0.1972 | 0.7016 | 825 | 1.1171 | 44614744 | | 0.1174 | 0.7058 | 830 | 1.1168 | 44887432 | | 0.1518 | 0.7101 | 835 | 1.1198 | 45158376 | | 0.1729 | 0.7143 | 840 | 1.1176 | 45424984 | | 0.1381 | 0.7186 | 845 | 1.1167 | 45693168 | | 0.1236 | 0.7228 | 850 | 1.1210 | 45969040 | | 0.1639 | 0.7271 | 855 | 1.1176 | 46238320 | | 0.2011 | 0.7313 | 860 | 1.1147 | 46514712 | | 0.1606 | 0.7356 | 865 | 1.1172 | 46778816 | | 0.1503 | 0.7398 | 870 | 1.1166 | 47045128 | | 0.1572 | 0.7441 | 875 | 1.1153 | 47313360 | | 0.1193 | 0.7483 | 880 | 1.1190 | 47582504 | | 0.1329 | 0.7526 | 885 | 1.1170 | 47848336 | | 0.1922 | 0.7568 | 890 | 1.1138 | 48111768 | | 0.1721 | 0.7611 | 895 | 1.1151 | 48383224 | | 0.1415 | 0.7653 | 900 | 1.1139 | 48647984 | | 0.214 | 0.7696 | 905 | 1.1118 | 48921112 | | 0.2069 | 0.7739 | 910 | 1.1152 | 49188248 | | 0.1435 | 0.7781 | 915 | 1.1134 | 49456432 | | 0.1642 | 0.7824 | 920 | 1.1133 | 49725712 | | 0.1598 | 0.7866 | 925 | 1.1138 | 49986872 | | 0.1459 | 0.7909 | 930 | 1.1109 | 50260592 | | 0.1139 | 0.7951 | 935 | 1.1126 | 50535000 | | 0.1806 | 0.7994 | 940 | 1.1131 | 50804760 | | 0.1549 | 0.8036 | 945 | 1.1120 | 51071224 | | 0.1602 | 0.8079 | 950 | 1.1095 | 51345824 | | 0.1818 | 0.8121 | 955 | 1.1135 | 51613104 | | 0.1792 | 0.8164 | 960 | 1.1131 | 51881448 | | 0.1803 | 0.8206 | 965 | 1.1121 | 52152368 | | 0.2375 | 0.8249 | 970 | 1.1108 | 52424208 | | 0.1872 | 0.8291 | 975 | 1.1119 | 52690144 | | 0.1566 | 0.8334 | 980 | 1.1111 | 52954608 | | 0.1376 | 0.8376 | 985 | 1.1098 | 53220256 | | 0.0842 | 0.8419 | 990 | 1.1110 | 53492064 | | 0.1268 | 0.8461 | 995 | 1.1118 | 53761488 | | 0.1792 | 0.8504 | 1000 | 1.1117 | 54027464 | | 0.1417 | 0.8546 | 1005 | 1.1098 | 54298720 | | 0.1595 | 0.8589 | 1010 | 1.1113 | 54574696 | | 0.1297 | 0.8631 | 1015 | 1.1114 | 54842864 | | 0.1904 | 0.8674 | 1020 | 1.1122 | 55107880 | | 0.1061 | 0.8716 | 1025 | 1.1122 | 55381152 | | 0.1769 | 0.8759 | 1030 | 1.1085 | 55649992 | | 0.1567 | 0.8801 | 1035 | 1.1075 | 55919072 | | 0.203 | 0.8844 | 1040 | 1.1093 | 56191424 | | 0.1557 | 0.8887 | 1045 | 1.1089 | 56464008 | | 0.21 | 0.8929 | 1050 | 1.1084 | 56737864 | | 0.2126 | 0.8972 | 1055 | 1.1089 | 57003496 | | 0.1087 | 0.9014 | 1060 | 1.1076 | 57278832 | | 0.1838 | 0.9057 | 1065 | 1.1090 | 57551424 | | 0.1381 | 0.9099 | 1070 | 1.1097 | 57824320 | | 0.1953 | 0.9142 | 1075 | 1.1083 | 58091512 | | 0.2044 | 0.9184 | 1080 | 1.1065 | 58358824 | | 0.1871 | 0.9227 | 1085 | 1.1077 | 58626864 | | 0.1504 | 0.9269 | 1090 | 1.1078 | 58889800 | | 0.1559 | 0.9312 | 1095 | 1.1067 | 59159160 | | 0.2046 | 0.9354 | 1100 | 1.1091 | 59430264 | | 0.2033 | 0.9397 | 1105 | 1.1066 | 59697408 | | 0.1562 | 0.9439 | 1110 | 1.1046 | 59965984 | | 0.1652 | 0.9482 | 1115 | 1.1079 | 60236760 | | 0.1624 | 0.9524 | 1120 | 1.1068 | 60502576 | | 0.1708 | 0.9567 | 1125 | 1.1058 | 60770544 | | 0.1041 | 0.9609 | 1130 | 1.1055 | 61037512 | | 0.1748 | 0.9652 | 1135 | 1.1061 | 61313056 | | 0.1736 | 0.9694 | 1140 | 1.1059 | 61577720 | | 0.15 | 0.9737 | 1145 | 1.1074 | 61847592 | | 0.1312 | 0.9779 | 1150 | 1.1078 | 62104056 | | 0.2414 | 0.9822 | 1155 | 1.1051 | 62373576 | | 0.1648 | 0.9864 | 1160 | 1.1045 | 62645296 | | 0.1681 | 0.9907 | 1165 | 1.1086 | 62911584 | | 0.1334 | 0.9950 | 1170 | 1.1087 | 63184416 | | 0.1686 | 0.9992 | 1175 | 1.1047 | 63448944 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1