--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1169 - Num Input Tokens Seen: 55042096 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 1 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.6601 | 0.0049 | 5 | 1.3933 | 275208 | | 1.6694 | 0.0099 | 10 | 1.3714 | 543056 | | 1.536 | 0.0148 | 15 | 1.3191 | 812296 | | 1.4788 | 0.0197 | 20 | 1.2663 | 1088768 | | 1.3749 | 0.0246 | 25 | 1.2258 | 1364960 | | 1.2692 | 0.0296 | 30 | 1.1844 | 1633832 | | 1.2494 | 0.0345 | 35 | 1.1716 | 1910400 | | 1.1173 | 0.0394 | 40 | 1.1704 | 2179696 | | 1.05 | 0.0443 | 45 | 1.1607 | 2453432 | | 1.0444 | 0.0493 | 50 | 1.1793 | 2732968 | | 0.9271 | 0.0542 | 55 | 1.2081 | 3003744 | | 0.791 | 0.0591 | 60 | 1.2350 | 3269488 | | 0.7387 | 0.0640 | 65 | 1.2237 | 3540984 | | 0.7249 | 0.0690 | 70 | 1.2430 | 3801792 | | 0.5625 | 0.0739 | 75 | 1.2336 | 4085336 | | 0.5989 | 0.0788 | 80 | 1.2534 | 4362880 | | 0.5361 | 0.0837 | 85 | 1.2519 | 4631784 | | 0.3883 | 0.0887 | 90 | 1.2304 | 4902152 | | 0.4381 | 0.0936 | 95 | 1.2336 | 5169632 | | 0.4399 | 0.0985 | 100 | 1.2397 | 5437272 | | 0.4003 | 0.1034 | 105 | 1.2370 | 5709848 | | 0.372 | 0.1084 | 110 | 1.2283 | 5976624 | | 0.406 | 0.1133 | 115 | 1.2246 | 6248280 | | 0.323 | 0.1182 | 120 | 1.2187 | 6518400 | | 0.2448 | 0.1232 | 125 | 1.2250 | 6794920 | | 0.3488 | 0.1281 | 130 | 1.2172 | 7069680 | | 0.3836 | 0.1330 | 135 | 1.2070 | 7342192 | | 0.3236 | 0.1379 | 140 | 1.2137 | 7610576 | | 0.2973 | 0.1429 | 145 | 1.2076 | 7882736 | | 0.2627 | 0.1478 | 150 | 1.2141 | 8160296 | | 0.2389 | 0.1527 | 155 | 1.2010 | 8437848 | | 0.3666 | 0.1576 | 160 | 1.2094 | 8712728 | | 0.2794 | 0.1626 | 165 | 1.2009 | 8985576 | | 0.2945 | 0.1675 | 170 | 1.2067 | 9256904 | | 0.2087 | 0.1724 | 175 | 1.2007 | 9530952 | | 0.2006 | 0.1773 | 180 | 1.2026 | 9800136 | | 0.2707 | 0.1823 | 185 | 1.2024 | 10071408 | | 0.244 | 0.1872 | 190 | 1.1974 | 10343552 | | 0.2293 | 0.1921 | 195 | 1.2025 | 10614440 | | 0.2367 | 0.1970 | 200 | 1.1977 | 10882064 | | 0.2501 | 0.2020 | 205 | 1.1959 | 11159480 | | 0.2884 | 0.2069 | 210 | 1.1941 | 11430048 | | 0.2586 | 0.2118 | 215 | 1.1873 | 11695928 | | 0.1916 | 0.2167 | 220 | 1.1968 | 11970272 | | 0.2815 | 0.2217 | 225 | 1.1896 | 12242392 | | 0.2537 | 0.2266 | 230 | 1.1859 | 12510872 | | 0.1636 | 0.2315 | 235 | 1.1928 | 12782416 | | 0.2667 | 0.2365 | 240 | 1.1867 | 13052648 | | 0.2468 | 0.2414 | 245 | 1.1842 | 13325392 | | 0.1954 | 0.2463 | 250 | 1.1913 | 13600336 | | 0.2912 | 0.2512 | 255 | 1.1829 | 13869944 | | 0.2128 | 0.2562 | 260 | 1.1821 | 14138536 | | 0.1788 | 0.2611 | 265 | 1.1837 | 14407760 | | 0.2236 | 0.2660 | 270 | 1.1810 | 14675520 | | 0.1779 | 0.2709 | 275 | 1.1757 | 14953200 | | 0.1968 | 0.2759 | 280 | 1.1780 | 15220960 | | 0.2199 | 0.2808 | 285 | 1.1735 | 15483032 | | 0.2181 | 0.2857 | 290 | 1.1746 | 15753752 | | 0.259 | 0.2906 | 295 | 1.1725 | 16025120 | | 0.1761 | 0.2956 | 300 | 1.1737 | 16300000 | | 0.2333 | 0.3005 | 305 | 1.1736 | 16568800 | | 0.2976 | 0.3054 | 310 | 1.1714 | 16843896 | | 0.2326 | 0.3103 | 315 | 1.1723 | 17111568 | | 0.1595 | 0.3153 | 320 | 1.1724 | 17386608 | | 0.1421 | 0.3202 | 325 | 1.1670 | 17649840 | | 0.2473 | 0.3251 | 330 | 1.1718 | 17918832 | | 0.2452 | 0.3300 | 335 | 1.1666 | 18196184 | | 0.1613 | 0.3350 | 340 | 1.1665 | 18471384 | | 0.194 | 0.3399 | 345 | 1.1655 | 18747288 | | 0.3169 | 0.3448 | 350 | 1.1648 | 19025008 | | 0.2124 | 0.3498 | 355 | 1.1624 | 19292568 | | 0.2656 | 0.3547 | 360 | 1.1614 | 19568560 | | 0.3003 | 0.3596 | 365 | 1.1610 | 19834968 | | 0.2401 | 0.3645 | 370 | 1.1580 | 20105912 | | 0.1571 | 0.3695 | 375 | 1.1553 | 20379656 | | 0.2318 | 0.3744 | 380 | 1.1608 | 20650848 | | 0.1987 | 0.3793 | 385 | 1.1550 | 20923264 | | 0.2576 | 0.3842 | 390 | 1.1584 | 21192336 | | 0.218 | 0.3892 | 395 | 1.1577 | 21467624 | | 0.1804 | 0.3941 | 400 | 1.1532 | 21734880 | | 0.1477 | 0.3990 | 405 | 1.1585 | 22001608 | | 0.1875 | 0.4039 | 410 | 1.1526 | 22273872 | | 0.1754 | 0.4089 | 415 | 1.1520 | 22540176 | | 0.2451 | 0.4138 | 420 | 1.1544 | 22810392 | | 0.2631 | 0.4187 | 425 | 1.1498 | 23079816 | | 0.2715 | 0.4236 | 430 | 1.1525 | 23348096 | | 0.323 | 0.4286 | 435 | 1.1500 | 23622128 | | 0.2832 | 0.4335 | 440 | 1.1457 | 23890632 | | 0.1261 | 0.4384 | 445 | 1.1502 | 24159416 | | 0.2168 | 0.4433 | 450 | 1.1503 | 24419904 | | 0.2103 | 0.4483 | 455 | 1.1459 | 24687336 | | 0.2608 | 0.4532 | 460 | 1.1481 | 24955080 | | 0.22 | 0.4581 | 465 | 1.1443 | 25229640 | | 0.1916 | 0.4631 | 470 | 1.1460 | 25501944 | | 0.2282 | 0.4680 | 475 | 1.1426 | 25776592 | | 0.1444 | 0.4729 | 480 | 1.1434 | 26048704 | | 0.1415 | 0.4778 | 485 | 1.1462 | 26316704 | | 0.185 | 0.4828 | 490 | 1.1472 | 26583320 | | 0.1861 | 0.4877 | 495 | 1.1442 | 26848368 | | 0.2444 | 0.4926 | 500 | 1.1419 | 27127416 | | 0.149 | 0.4975 | 505 | 1.1452 | 27396328 | | 0.1879 | 0.5025 | 510 | 1.1436 | 27669312 | | 0.1951 | 0.5074 | 515 | 1.1413 | 27941728 | | 0.1736 | 0.5123 | 520 | 1.1404 | 28213376 | | 0.2361 | 0.5172 | 525 | 1.1408 | 28479464 | | 0.144 | 0.5222 | 530 | 1.1401 | 28749592 | | 0.2333 | 0.5271 | 535 | 1.1374 | 29024360 | | 0.1981 | 0.5320 | 540 | 1.1400 | 29294184 | | 0.2333 | 0.5369 | 545 | 1.1390 | 29566272 | | 0.2308 | 0.5419 | 550 | 1.1370 | 29830248 | | 0.1955 | 0.5468 | 555 | 1.1402 | 30101136 | | 0.1906 | 0.5517 | 560 | 1.1387 | 30372424 | | 0.2144 | 0.5567 | 565 | 1.1347 | 30646952 | | 0.1965 | 0.5616 | 570 | 1.1368 | 30908728 | | 0.2239 | 0.5665 | 575 | 1.1374 | 31183896 | | 0.2104 | 0.5714 | 580 | 1.1331 | 31457680 | | 0.2487 | 0.5764 | 585 | 1.1344 | 31731136 | | 0.1382 | 0.5813 | 590 | 1.1355 | 32004256 | | 0.186 | 0.5862 | 595 | 1.1358 | 32271512 | | 0.1755 | 0.5911 | 600 | 1.1321 | 32542736 | | 0.207 | 0.5961 | 605 | 1.1340 | 32812256 | | 0.2216 | 0.6010 | 610 | 1.1342 | 33085400 | | 0.2461 | 0.6059 | 615 | 1.1324 | 33351528 | | 0.1588 | 0.6108 | 620 | 1.1333 | 33621000 | | 0.2488 | 0.6158 | 625 | 1.1328 | 33894352 | | 0.181 | 0.6207 | 630 | 1.1314 | 34162640 | | 0.2122 | 0.6256 | 635 | 1.1305 | 34441064 | | 0.1398 | 0.6305 | 640 | 1.1329 | 34708416 | | 0.1988 | 0.6355 | 645 | 1.1295 | 34979800 | | 0.2596 | 0.6404 | 650 | 1.1311 | 35247784 | | 0.2201 | 0.6453 | 655 | 1.1333 | 35517048 | | 0.1438 | 0.6502 | 660 | 1.1319 | 35789536 | | 0.1782 | 0.6552 | 665 | 1.1336 | 36051200 | | 0.1692 | 0.6601 | 670 | 1.1314 | 36323000 | | 0.1822 | 0.6650 | 675 | 1.1290 | 36599392 | | 0.1981 | 0.6700 | 680 | 1.1326 | 36870968 | | 0.1644 | 0.6749 | 685 | 1.1307 | 37137392 | | 0.2556 | 0.6798 | 690 | 1.1259 | 37411192 | | 0.1742 | 0.6847 | 695 | 1.1295 | 37680888 | | 0.1956 | 0.6897 | 700 | 1.1290 | 37949912 | | 0.1299 | 0.6946 | 705 | 1.1281 | 38216184 | | 0.1665 | 0.6995 | 710 | 1.1307 | 38485544 | | 0.2755 | 0.7044 | 715 | 1.1260 | 38759264 | | 0.1837 | 0.7094 | 720 | 1.1259 | 39020752 | | 0.1687 | 0.7143 | 725 | 1.1282 | 39293040 | | 0.1264 | 0.7192 | 730 | 1.1267 | 39568136 | | 0.2541 | 0.7241 | 735 | 1.1279 | 39839448 | | 0.1304 | 0.7291 | 740 | 1.1284 | 40116608 | | 0.2105 | 0.7340 | 745 | 1.1281 | 40383120 | | 0.1929 | 0.7389 | 750 | 1.1247 | 40651072 | | 0.2045 | 0.7438 | 755 | 1.1267 | 40929488 | | 0.2181 | 0.7488 | 760 | 1.1267 | 41199448 | | 0.2374 | 0.7537 | 765 | 1.1251 | 41478632 | | 0.1643 | 0.7586 | 770 | 1.1266 | 41749592 | | 0.1818 | 0.7635 | 775 | 1.1250 | 42021576 | | 0.1775 | 0.7685 | 780 | 1.1246 | 42289112 | | 0.1259 | 0.7734 | 785 | 1.1264 | 42557584 | | 0.1973 | 0.7783 | 790 | 1.1243 | 42822968 | | 0.1677 | 0.7833 | 795 | 1.1259 | 43095848 | | 0.2458 | 0.7882 | 800 | 1.1257 | 43366576 | | 0.1226 | 0.7931 | 805 | 1.1220 | 43635976 | | 0.2169 | 0.7980 | 810 | 1.1268 | 43906296 | | 0.1237 | 0.8030 | 815 | 1.1263 | 44180384 | | 0.2049 | 0.8079 | 820 | 1.1226 | 44444712 | | 0.1323 | 0.8128 | 825 | 1.1236 | 44719944 | | 0.1943 | 0.8177 | 830 | 1.1254 | 44993064 | | 0.1782 | 0.8227 | 835 | 1.1249 | 45266512 | | 0.2226 | 0.8276 | 840 | 1.1236 | 45533648 | | 0.124 | 0.8325 | 845 | 1.1225 | 45804216 | | 0.1541 | 0.8374 | 850 | 1.1214 | 46079880 | | 0.1737 | 0.8424 | 855 | 1.1219 | 46348160 | | 0.1943 | 0.8473 | 860 | 1.1231 | 46622696 | | 0.1656 | 0.8522 | 865 | 1.1215 | 46897472 | | 0.2735 | 0.8571 | 870 | 1.1232 | 47169856 | | 0.2191 | 0.8621 | 875 | 1.1207 | 47441544 | | 0.1572 | 0.8670 | 880 | 1.1191 | 47711248 | | 0.2098 | 0.8719 | 885 | 1.1229 | 47992104 | | 0.1243 | 0.8768 | 890 | 1.1214 | 48260960 | | 0.1993 | 0.8818 | 895 | 1.1194 | 48531184 | | 0.1662 | 0.8867 | 900 | 1.1204 | 48801416 | | 0.1656 | 0.8916 | 905 | 1.1216 | 49071632 | | 0.1585 | 0.8966 | 910 | 1.1188 | 49346128 | | 0.1253 | 0.9015 | 915 | 1.1213 | 49620760 | | 0.1226 | 0.9064 | 920 | 1.1216 | 49898432 | | 0.2 | 0.9113 | 925 | 1.1183 | 50174576 | | 0.0812 | 0.9163 | 930 | 1.1189 | 50444160 | | 0.1893 | 0.9212 | 935 | 1.1239 | 50714744 | | 0.2024 | 0.9261 | 940 | 1.1217 | 50982000 | | 0.1282 | 0.9310 | 945 | 1.1195 | 51253960 | | 0.1622 | 0.9360 | 950 | 1.1198 | 51528736 | | 0.1918 | 0.9409 | 955 | 1.1181 | 51801648 | | 0.1359 | 0.9458 | 960 | 1.1174 | 52079152 | | 0.152 | 0.9507 | 965 | 1.1186 | 52346792 | | 0.2182 | 0.9557 | 970 | 1.1161 | 52614496 | | 0.2059 | 0.9606 | 975 | 1.1155 | 52876808 | | 0.1561 | 0.9655 | 980 | 1.1174 | 53155432 | | 0.1907 | 0.9704 | 985 | 1.1158 | 53420992 | | 0.1577 | 0.9754 | 990 | 1.1163 | 53690640 | | 0.1971 | 0.9803 | 995 | 1.1185 | 53961192 | | 0.231 | 0.9852 | 1000 | 1.1161 | 54235384 | | 0.1759 | 0.9901 | 1005 | 1.1135 | 54502912 | | 0.181 | 0.9951 | 1010 | 1.1162 | 54775312 | | 0.1815 | 1.0 | 1015 | 1.1169 | 55042096 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1