--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1076 - Num Input Tokens Seen: 55148120 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 2 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.6242 | 0.0049 | 5 | 1.3934 | 272064 | | 1.6752 | 0.0098 | 10 | 1.3710 | 554032 | | 1.6228 | 0.0147 | 15 | 1.3186 | 827360 | | 1.543 | 0.0197 | 20 | 1.2658 | 1096136 | | 1.423 | 0.0246 | 25 | 1.2235 | 1372816 | | 1.3264 | 0.0295 | 30 | 1.1841 | 1647472 | | 1.2995 | 0.0344 | 35 | 1.1788 | 1919584 | | 1.1474 | 0.0393 | 40 | 1.1874 | 2196376 | | 1.0666 | 0.0442 | 45 | 1.1899 | 2465560 | | 0.985 | 0.0492 | 50 | 1.2221 | 2741672 | | 0.713 | 0.0541 | 55 | 1.2507 | 3013656 | | 0.5765 | 0.0590 | 60 | 1.2796 | 3290880 | | 0.5448 | 0.0639 | 65 | 1.2517 | 3557720 | | 0.4657 | 0.0688 | 70 | 1.2520 | 3832752 | | 0.4618 | 0.0737 | 75 | 1.2386 | 4109368 | | 0.3921 | 0.0786 | 80 | 1.2370 | 4375248 | | 0.3891 | 0.0836 | 85 | 1.2284 | 4647088 | | 0.4561 | 0.0885 | 90 | 1.2214 | 4917208 | | 0.3254 | 0.0934 | 95 | 1.2225 | 5185928 | | 0.2939 | 0.0983 | 100 | 1.2260 | 5452632 | | 0.3003 | 0.1032 | 105 | 1.2119 | 5723992 | | 0.2921 | 0.1081 | 110 | 1.2143 | 5998096 | | 0.3005 | 0.1130 | 115 | 1.2009 | 6272024 | | 0.2106 | 0.1180 | 120 | 1.2049 | 6544088 | | 0.3227 | 0.1229 | 125 | 1.2043 | 6815696 | | 0.359 | 0.1278 | 130 | 1.2067 | 7091168 | | 0.2451 | 0.1327 | 135 | 1.2018 | 7363752 | | 0.2543 | 0.1376 | 140 | 1.2051 | 7634680 | | 0.2264 | 0.1425 | 145 | 1.1911 | 7902192 | | 0.2881 | 0.1475 | 150 | 1.1969 | 8174216 | | 0.2406 | 0.1524 | 155 | 1.1873 | 8446056 | | 0.2712 | 0.1573 | 160 | 1.1878 | 8711432 | | 0.2502 | 0.1622 | 165 | 1.1933 | 8986104 | | 0.2625 | 0.1671 | 170 | 1.1817 | 9260624 | | 0.2239 | 0.1720 | 175 | 1.1872 | 9537496 | | 0.2087 | 0.1769 | 180 | 1.1822 | 9811632 | | 0.2819 | 0.1819 | 185 | 1.1781 | 10083560 | | 0.1772 | 0.1868 | 190 | 1.1825 | 10356680 | | 0.2153 | 0.1917 | 195 | 1.1797 | 10627000 | | 0.2606 | 0.1966 | 200 | 1.1768 | 10901816 | | 0.183 | 0.2015 | 205 | 1.1799 | 11168408 | | 0.1972 | 0.2064 | 210 | 1.1756 | 11441368 | | 0.2959 | 0.2114 | 215 | 1.1733 | 11712792 | | 0.2225 | 0.2163 | 220 | 1.1740 | 11983016 | | 0.3001 | 0.2212 | 225 | 1.1673 | 12252912 | | 0.2043 | 0.2261 | 230 | 1.1743 | 12527672 | | 0.2225 | 0.2310 | 235 | 1.1721 | 12804760 | | 0.2131 | 0.2359 | 240 | 1.1681 | 13073272 | | 0.2541 | 0.2408 | 245 | 1.1697 | 13343952 | | 0.2392 | 0.2458 | 250 | 1.1652 | 13615616 | | 0.2222 | 0.2507 | 255 | 1.1673 | 13878200 | | 0.2152 | 0.2556 | 260 | 1.1603 | 14145720 | | 0.1775 | 0.2605 | 265 | 1.1601 | 14421576 | | 0.184 | 0.2654 | 270 | 1.1659 | 14691256 | | 0.1615 | 0.2703 | 275 | 1.1560 | 14966608 | | 0.2042 | 0.2753 | 280 | 1.1613 | 15238320 | | 0.2344 | 0.2802 | 285 | 1.1605 | 15514760 | | 0.1502 | 0.2851 | 290 | 1.1520 | 15782440 | | 0.1738 | 0.2900 | 295 | 1.1576 | 16044664 | | 0.2125 | 0.2949 | 300 | 1.1566 | 16313976 | | 0.2228 | 0.2998 | 305 | 1.1505 | 16586576 | | 0.1751 | 0.3047 | 310 | 1.1548 | 16857848 | | 0.2008 | 0.3097 | 315 | 1.1546 | 17126992 | | 0.1452 | 0.3146 | 320 | 1.1527 | 17400296 | | 0.2659 | 0.3195 | 325 | 1.1553 | 17668032 | | 0.2173 | 0.3244 | 330 | 1.1508 | 17935336 | | 0.215 | 0.3293 | 335 | 1.1485 | 18205424 | | 0.2193 | 0.3342 | 340 | 1.1501 | 18481400 | | 0.1883 | 0.3391 | 345 | 1.1484 | 18752120 | | 0.1204 | 0.3441 | 350 | 1.1455 | 19022232 | | 0.2041 | 0.3490 | 355 | 1.1473 | 19291984 | | 0.1734 | 0.3539 | 360 | 1.1446 | 19560032 | | 0.191 | 0.3588 | 365 | 1.1459 | 19841512 | | 0.2036 | 0.3637 | 370 | 1.1427 | 20110248 | | 0.227 | 0.3686 | 375 | 1.1416 | 20383840 | | 0.2724 | 0.3736 | 380 | 1.1432 | 20651880 | | 0.277 | 0.3785 | 385 | 1.1394 | 20925536 | | 0.185 | 0.3834 | 390 | 1.1404 | 21190872 | | 0.1613 | 0.3883 | 395 | 1.1423 | 21462104 | | 0.2139 | 0.3932 | 400 | 1.1366 | 21735760 | | 0.238 | 0.3981 | 405 | 1.1401 | 22007944 | | 0.1772 | 0.4030 | 410 | 1.1446 | 22274704 | | 0.2354 | 0.4080 | 415 | 1.1385 | 22551304 | | 0.2089 | 0.4129 | 420 | 1.1372 | 22819992 | | 0.1772 | 0.4178 | 425 | 1.1395 | 23085864 | | 0.2116 | 0.4227 | 430 | 1.1360 | 23355776 | | 0.1528 | 0.4276 | 435 | 1.1362 | 23630936 | | 0.1801 | 0.4325 | 440 | 1.1363 | 23902680 | | 0.152 | 0.4375 | 445 | 1.1318 | 24168248 | | 0.237 | 0.4424 | 450 | 1.1363 | 24435488 | | 0.1998 | 0.4473 | 455 | 1.1348 | 24710872 | | 0.2259 | 0.4522 | 460 | 1.1325 | 24983416 | | 0.2071 | 0.4571 | 465 | 1.1319 | 25250048 | | 0.16 | 0.4620 | 470 | 1.1330 | 25521736 | | 0.1693 | 0.4669 | 475 | 1.1312 | 25795336 | | 0.2649 | 0.4719 | 480 | 1.1308 | 26066920 | | 0.1038 | 0.4768 | 485 | 1.1307 | 26331024 | | 0.1938 | 0.4817 | 490 | 1.1287 | 26598616 | | 0.1767 | 0.4866 | 495 | 1.1319 | 26869544 | | 0.3223 | 0.4915 | 500 | 1.1328 | 27140784 | | 0.1802 | 0.4964 | 505 | 1.1282 | 27411872 | | 0.1962 | 0.5014 | 510 | 1.1316 | 27675280 | | 0.1977 | 0.5063 | 515 | 1.1293 | 27943040 | | 0.1458 | 0.5112 | 520 | 1.1286 | 28217320 | | 0.2375 | 0.5161 | 525 | 1.1290 | 28493040 | | 0.2269 | 0.5210 | 530 | 1.1275 | 28762672 | | 0.1589 | 0.5259 | 535 | 1.1280 | 29029744 | | 0.2142 | 0.5308 | 540 | 1.1297 | 29299000 | | 0.2219 | 0.5358 | 545 | 1.1282 | 29570248 | | 0.1128 | 0.5407 | 550 | 1.1286 | 29847000 | | 0.1866 | 0.5456 | 555 | 1.1272 | 30115376 | | 0.1865 | 0.5505 | 560 | 1.1279 | 30389984 | | 0.2061 | 0.5554 | 565 | 1.1234 | 30655792 | | 0.1548 | 0.5603 | 570 | 1.1237 | 30933664 | | 0.2025 | 0.5652 | 575 | 1.1249 | 31201768 | | 0.2701 | 0.5702 | 580 | 1.1261 | 31476376 | | 0.2446 | 0.5751 | 585 | 1.1236 | 31743576 | | 0.1323 | 0.5800 | 590 | 1.1243 | 32012336 | | 0.2005 | 0.5849 | 595 | 1.1241 | 32285872 | | 0.1525 | 0.5898 | 600 | 1.1249 | 32558824 | | 0.1703 | 0.5947 | 605 | 1.1236 | 32825608 | | 0.1633 | 0.5997 | 610 | 1.1211 | 33097056 | | 0.1968 | 0.6046 | 615 | 1.1234 | 33371136 | | 0.2604 | 0.6095 | 620 | 1.1223 | 33637528 | | 0.2091 | 0.6144 | 625 | 1.1225 | 33906600 | | 0.1176 | 0.6193 | 630 | 1.1248 | 34176584 | | 0.1487 | 0.6242 | 635 | 1.1229 | 34448496 | | 0.199 | 0.6291 | 640 | 1.1209 | 34722752 | | 0.1523 | 0.6341 | 645 | 1.1212 | 34990088 | | 0.1457 | 0.6390 | 650 | 1.1237 | 35259080 | | 0.2531 | 0.6439 | 655 | 1.1227 | 35525968 | | 0.1487 | 0.6488 | 660 | 1.1193 | 35797952 | | 0.1589 | 0.6537 | 665 | 1.1216 | 36072304 | | 0.2855 | 0.6586 | 670 | 1.1224 | 36343472 | | 0.1557 | 0.6636 | 675 | 1.1186 | 36614592 | | 0.1411 | 0.6685 | 680 | 1.1202 | 36886360 | | 0.2196 | 0.6734 | 685 | 1.1211 | 37158136 | | 0.1054 | 0.6783 | 690 | 1.1204 | 37430296 | | 0.2536 | 0.6832 | 695 | 1.1198 | 37703184 | | 0.2347 | 0.6881 | 700 | 1.1187 | 37972000 | | 0.2074 | 0.6930 | 705 | 1.1180 | 38244936 | | 0.1818 | 0.6980 | 710 | 1.1156 | 38515152 | | 0.1484 | 0.7029 | 715 | 1.1196 | 38786104 | | 0.234 | 0.7078 | 720 | 1.1224 | 39053816 | | 0.1783 | 0.7127 | 725 | 1.1179 | 39323896 | | 0.159 | 0.7176 | 730 | 1.1158 | 39599848 | | 0.1323 | 0.7225 | 735 | 1.1204 | 39869656 | | 0.1816 | 0.7275 | 740 | 1.1216 | 40137064 | | 0.175 | 0.7324 | 745 | 1.1173 | 40405408 | | 0.2641 | 0.7373 | 750 | 1.1163 | 40673816 | | 0.1334 | 0.7422 | 755 | 1.1151 | 40936336 | | 0.2107 | 0.7471 | 760 | 1.1186 | 41207808 | | 0.2213 | 0.7520 | 765 | 1.1162 | 41484608 | | 0.1493 | 0.7569 | 770 | 1.1133 | 41758384 | | 0.1367 | 0.7619 | 775 | 1.1153 | 42031848 | | 0.1636 | 0.7668 | 780 | 1.1173 | 42293304 | | 0.1492 | 0.7717 | 785 | 1.1160 | 42563384 | | 0.2128 | 0.7766 | 790 | 1.1158 | 42825784 | | 0.2324 | 0.7815 | 795 | 1.1155 | 43101064 | | 0.2325 | 0.7864 | 800 | 1.1134 | 43373512 | | 0.1865 | 0.7913 | 805 | 1.1167 | 43637872 | | 0.2124 | 0.7963 | 810 | 1.1154 | 43905256 | | 0.1661 | 0.8012 | 815 | 1.1109 | 44173024 | | 0.1994 | 0.8061 | 820 | 1.1108 | 44451088 | | 0.2008 | 0.8110 | 825 | 1.1119 | 44724768 | | 0.1678 | 0.8159 | 830 | 1.1130 | 44995112 | | 0.2089 | 0.8208 | 835 | 1.1126 | 45265880 | | 0.2064 | 0.8258 | 840 | 1.1119 | 45532968 | | 0.2039 | 0.8307 | 845 | 1.1133 | 45810568 | | 0.152 | 0.8356 | 850 | 1.1124 | 46083736 | | 0.1731 | 0.8405 | 855 | 1.1112 | 46356904 | | 0.2052 | 0.8454 | 860 | 1.1110 | 46627936 | | 0.2187 | 0.8503 | 865 | 1.1093 | 46903968 | | 0.2456 | 0.8552 | 870 | 1.1106 | 47175120 | | 0.1912 | 0.8602 | 875 | 1.1124 | 47446448 | | 0.1495 | 0.8651 | 880 | 1.1115 | 47725256 | | 0.2542 | 0.8700 | 885 | 1.1117 | 47996528 | | 0.202 | 0.8749 | 890 | 1.1092 | 48262184 | | 0.0888 | 0.8798 | 895 | 1.1104 | 48535040 | | 0.1544 | 0.8847 | 900 | 1.1143 | 48812872 | | 0.1341 | 0.8897 | 905 | 1.1120 | 49088752 | | 0.1137 | 0.8946 | 910 | 1.1115 | 49363536 | | 0.2127 | 0.8995 | 915 | 1.1076 | 49640256 | | 0.2183 | 0.9044 | 920 | 1.1078 | 49908416 | | 0.1487 | 0.9093 | 925 | 1.1101 | 50178216 | | 0.2102 | 0.9142 | 930 | 1.1093 | 50445536 | | 0.2309 | 0.9191 | 935 | 1.1090 | 50712632 | | 0.2157 | 0.9241 | 940 | 1.1096 | 50986048 | | 0.1194 | 0.9290 | 945 | 1.1090 | 51259968 | | 0.1138 | 0.9339 | 950 | 1.1091 | 51530784 | | 0.2443 | 0.9388 | 955 | 1.1094 | 51803680 | | 0.1772 | 0.9437 | 960 | 1.1085 | 52071288 | | 0.1181 | 0.9486 | 965 | 1.1093 | 52337984 | | 0.1651 | 0.9536 | 970 | 1.1100 | 52608272 | | 0.1881 | 0.9585 | 975 | 1.1097 | 52870304 | | 0.2214 | 0.9634 | 980 | 1.1055 | 53151200 | | 0.1554 | 0.9683 | 985 | 1.1063 | 53423592 | | 0.1906 | 0.9732 | 990 | 1.1078 | 53699536 | | 0.1411 | 0.9781 | 995 | 1.1064 | 53969424 | | 0.1967 | 0.9830 | 1000 | 1.1058 | 54241280 | | 0.1977 | 0.9880 | 1005 | 1.1067 | 54503696 | | 0.1763 | 0.9929 | 1010 | 1.1053 | 54769592 | | 0.1614 | 0.9978 | 1015 | 1.1067 | 55039120 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1