--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter14_sftsd2 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter14_sftsd2 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1072 - Num Input Tokens Seen: 72483520 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 2 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3909 | 0 | | 1.6216 | 0.0037 | 5 | 1.3897 | 273928 | | 1.5677 | 0.0075 | 10 | 1.3772 | 540496 | | 1.5223 | 0.0112 | 15 | 1.3459 | 806024 | | 1.4364 | 0.0149 | 20 | 1.2958 | 1080064 | | 1.3754 | 0.0186 | 25 | 1.2548 | 1342696 | | 1.341 | 0.0224 | 30 | 1.2272 | 1609776 | | 1.2543 | 0.0261 | 35 | 1.1938 | 1883584 | | 1.1445 | 0.0298 | 40 | 1.1967 | 2159576 | | 1.0439 | 0.0336 | 45 | 1.2223 | 2427016 | | 0.9516 | 0.0373 | 50 | 1.2217 | 2699496 | | 0.8087 | 0.0410 | 55 | 1.2222 | 2961896 | | 0.6456 | 0.0447 | 60 | 1.2632 | 3230456 | | 0.5882 | 0.0485 | 65 | 1.2657 | 3495256 | | 0.54 | 0.0522 | 70 | 1.2746 | 3766984 | | 0.4778 | 0.0559 | 75 | 1.2396 | 4040984 | | 0.4138 | 0.0597 | 80 | 1.2546 | 4317400 | | 0.3985 | 0.0634 | 85 | 1.2427 | 4587088 | | 0.4226 | 0.0671 | 90 | 1.2237 | 4851192 | | 0.335 | 0.0708 | 95 | 1.2147 | 5128392 | | 0.3381 | 0.0746 | 100 | 1.2222 | 5402272 | | 0.2826 | 0.0783 | 105 | 1.2087 | 5672736 | | 0.3858 | 0.0820 | 110 | 1.2034 | 5945640 | | 0.2916 | 0.0858 | 115 | 1.2149 | 6213712 | | 0.1812 | 0.0895 | 120 | 1.2091 | 6480944 | | 0.2106 | 0.0932 | 125 | 1.2118 | 6749984 | | 0.2204 | 0.0969 | 130 | 1.2141 | 7022400 | | 0.26 | 0.1007 | 135 | 1.2027 | 7290400 | | 0.2057 | 0.1044 | 140 | 1.1904 | 7559840 | | 0.1634 | 0.1081 | 145 | 1.2047 | 7828648 | | 0.2798 | 0.1119 | 150 | 1.1945 | 8103064 | | 0.218 | 0.1156 | 155 | 1.1967 | 8374240 | | 0.2143 | 0.1193 | 160 | 1.1997 | 8649728 | | 0.282 | 0.1230 | 165 | 1.1903 | 8915992 | | 0.2373 | 0.1268 | 170 | 1.1923 | 9179120 | | 0.186 | 0.1305 | 175 | 1.1857 | 9444544 | | 0.2151 | 0.1342 | 180 | 1.1888 | 9719840 | | 0.1982 | 0.1380 | 185 | 1.1860 | 9987088 | | 0.2093 | 0.1417 | 190 | 1.1933 | 10260608 | | 0.1927 | 0.1454 | 195 | 1.1829 | 10531552 | | 0.2653 | 0.1491 | 200 | 1.1856 | 10803216 | | 0.1893 | 0.1529 | 205 | 1.1855 | 11077920 | | 0.2772 | 0.1566 | 210 | 1.1858 | 11343864 | | 0.2151 | 0.1603 | 215 | 1.1842 | 11616992 | | 0.2485 | 0.1641 | 220 | 1.1834 | 11892040 | | 0.2226 | 0.1678 | 225 | 1.1787 | 12162184 | | 0.1264 | 0.1715 | 230 | 1.1788 | 12429176 | | 0.1665 | 0.1753 | 235 | 1.1733 | 12696000 | | 0.1108 | 0.1790 | 240 | 1.1739 | 12965168 | | 0.185 | 0.1827 | 245 | 1.1671 | 13239112 | | 0.2626 | 0.1864 | 250 | 1.1734 | 13514032 | | 0.1595 | 0.1902 | 255 | 1.1717 | 13785752 | | 0.2451 | 0.1939 | 260 | 1.1687 | 14059520 | | 0.2444 | 0.1976 | 265 | 1.1697 | 14324224 | | 0.2495 | 0.2014 | 270 | 1.1663 | 14590360 | | 0.2167 | 0.2051 | 275 | 1.1653 | 14853096 | | 0.1973 | 0.2088 | 280 | 1.1688 | 15122224 | | 0.1801 | 0.2125 | 285 | 1.1663 | 15394480 | | 0.1666 | 0.2163 | 290 | 1.1666 | 15661024 | | 0.1642 | 0.2200 | 295 | 1.1688 | 15928512 | | 0.2069 | 0.2237 | 300 | 1.1648 | 16201536 | | 0.1672 | 0.2275 | 305 | 1.1624 | 16470776 | | 0.1446 | 0.2312 | 310 | 1.1688 | 16744768 | | 0.1332 | 0.2349 | 315 | 1.1606 | 17008240 | | 0.1447 | 0.2386 | 320 | 1.1595 | 17273856 | | 0.1407 | 0.2424 | 325 | 1.1664 | 17549784 | | 0.2198 | 0.2461 | 330 | 1.1601 | 17822968 | | 0.1968 | 0.2498 | 335 | 1.1568 | 18095368 | | 0.1826 | 0.2536 | 340 | 1.1608 | 18371224 | | 0.1624 | 0.2573 | 345 | 1.1594 | 18648808 | | 0.1164 | 0.2610 | 350 | 1.1552 | 18912232 | | 0.1232 | 0.2647 | 355 | 1.1584 | 19180288 | | 0.2007 | 0.2685 | 360 | 1.1596 | 19453232 | | 0.163 | 0.2722 | 365 | 1.1516 | 19727312 | | 0.1141 | 0.2759 | 370 | 1.1587 | 20011560 | | 0.1235 | 0.2797 | 375 | 1.1526 | 20283680 | | 0.1914 | 0.2834 | 380 | 1.1499 | 20550296 | | 0.1682 | 0.2871 | 385 | 1.1512 | 20821616 | | 0.1194 | 0.2908 | 390 | 1.1508 | 21095920 | | 0.1079 | 0.2946 | 395 | 1.1529 | 21372392 | | 0.166 | 0.2983 | 400 | 1.1514 | 21644592 | | 0.1262 | 0.3020 | 405 | 1.1497 | 21914152 | | 0.1624 | 0.3058 | 410 | 1.1526 | 22186664 | | 0.1772 | 0.3095 | 415 | 1.1478 | 22459632 | | 0.2304 | 0.3132 | 420 | 1.1476 | 22730888 | | 0.0887 | 0.3169 | 425 | 1.1456 | 22992464 | | 0.1033 | 0.3207 | 430 | 1.1478 | 23263416 | | 0.1526 | 0.3244 | 435 | 1.1456 | 23530720 | | 0.1425 | 0.3281 | 440 | 1.1433 | 23792528 | | 0.1928 | 0.3319 | 445 | 1.1422 | 24066464 | | 0.1651 | 0.3356 | 450 | 1.1433 | 24328952 | | 0.1117 | 0.3393 | 455 | 1.1480 | 24589040 | | 0.1578 | 0.3430 | 460 | 1.1464 | 24861792 | | 0.1554 | 0.3468 | 465 | 1.1408 | 25140824 | | 0.1505 | 0.3505 | 470 | 1.1425 | 25408400 | | 0.1613 | 0.3542 | 475 | 1.1416 | 25681448 | | 0.1858 | 0.3580 | 480 | 1.1394 | 25948192 | | 0.1362 | 0.3617 | 485 | 1.1410 | 26216376 | | 0.2001 | 0.3654 | 490 | 1.1416 | 26485904 | | 0.153 | 0.3691 | 495 | 1.1407 | 26753784 | | 0.2446 | 0.3729 | 500 | 1.1432 | 27019552 | | 0.1468 | 0.3766 | 505 | 1.1389 | 27293064 | | 0.1343 | 0.3803 | 510 | 1.1388 | 27559304 | | 0.1486 | 0.3841 | 515 | 1.1379 | 27832208 | | 0.1227 | 0.3878 | 520 | 1.1369 | 28099304 | | 0.185 | 0.3915 | 525 | 1.1392 | 28366024 | | 0.1528 | 0.3952 | 530 | 1.1389 | 28634400 | | 0.1835 | 0.3990 | 535 | 1.1360 | 28906280 | | 0.1858 | 0.4027 | 540 | 1.1376 | 29174248 | | 0.1313 | 0.4064 | 545 | 1.1363 | 29446120 | | 0.1405 | 0.4102 | 550 | 1.1334 | 29716480 | | 0.1816 | 0.4139 | 555 | 1.1334 | 29985760 | | 0.2154 | 0.4176 | 560 | 1.1322 | 30252704 | | 0.1683 | 0.4213 | 565 | 1.1311 | 30523920 | | 0.1828 | 0.4251 | 570 | 1.1330 | 30795368 | | 0.1506 | 0.4288 | 575 | 1.1302 | 31062848 | | 0.1773 | 0.4325 | 580 | 1.1313 | 31336984 | | 0.1544 | 0.4363 | 585 | 1.1319 | 31611648 | | 0.1387 | 0.4400 | 590 | 1.1301 | 31880344 | | 0.1977 | 0.4437 | 595 | 1.1292 | 32151312 | | 0.1209 | 0.4474 | 600 | 1.1328 | 32418584 | | 0.1392 | 0.4512 | 605 | 1.1307 | 32693992 | | 0.1996 | 0.4549 | 610 | 1.1291 | 32968608 | | 0.2297 | 0.4586 | 615 | 1.1300 | 33237576 | | 0.1792 | 0.4624 | 620 | 1.1284 | 33504216 | | 0.1289 | 0.4661 | 625 | 1.1281 | 33778712 | | 0.2102 | 0.4698 | 630 | 1.1286 | 34048008 | | 0.1098 | 0.4735 | 635 | 1.1288 | 34318832 | | 0.1766 | 0.4773 | 640 | 1.1280 | 34588104 | | 0.1247 | 0.4810 | 645 | 1.1277 | 34863712 | | 0.1875 | 0.4847 | 650 | 1.1256 | 35137368 | | 0.1388 | 0.4885 | 655 | 1.1274 | 35401856 | | 0.1543 | 0.4922 | 660 | 1.1260 | 35669288 | | 0.1338 | 0.4959 | 665 | 1.1250 | 35938200 | | 0.1478 | 0.4997 | 670 | 1.1261 | 36214008 | | 0.078 | 0.5034 | 675 | 1.1283 | 36484712 | | 0.1088 | 0.5071 | 680 | 1.1274 | 36756992 | | 0.1612 | 0.5108 | 685 | 1.1240 | 37024952 | | 0.141 | 0.5146 | 690 | 1.1247 | 37288544 | | 0.1367 | 0.5183 | 695 | 1.1265 | 37560624 | | 0.158 | 0.5220 | 700 | 1.1268 | 37829640 | | 0.1697 | 0.5258 | 705 | 1.1262 | 38100088 | | 0.1348 | 0.5295 | 710 | 1.1253 | 38367944 | | 0.1406 | 0.5332 | 715 | 1.1238 | 38640424 | | 0.1578 | 0.5369 | 720 | 1.1266 | 38907976 | | 0.1835 | 0.5407 | 725 | 1.1277 | 39174392 | | 0.2109 | 0.5444 | 730 | 1.1236 | 39448744 | | 0.1624 | 0.5481 | 735 | 1.1219 | 39721744 | | 0.1249 | 0.5519 | 740 | 1.1256 | 39995928 | | 0.1682 | 0.5556 | 745 | 1.1246 | 40273496 | | 0.1751 | 0.5593 | 750 | 1.1226 | 40547520 | | 0.1961 | 0.5630 | 755 | 1.1253 | 40821000 | | 0.1429 | 0.5668 | 760 | 1.1276 | 41093648 | | 0.1388 | 0.5705 | 765 | 1.1218 | 41362480 | | 0.1274 | 0.5742 | 770 | 1.1220 | 41633192 | | 0.1763 | 0.5780 | 775 | 1.1238 | 41902024 | | 0.1543 | 0.5817 | 780 | 1.1229 | 42171960 | | 0.1535 | 0.5854 | 785 | 1.1226 | 42445296 | | 0.1456 | 0.5891 | 790 | 1.1235 | 42710176 | | 0.0793 | 0.5929 | 795 | 1.1218 | 42982376 | | 0.2123 | 0.5966 | 800 | 1.1231 | 43246288 | | 0.1695 | 0.6003 | 805 | 1.1223 | 43518184 | | 0.1431 | 0.6041 | 810 | 1.1233 | 43787688 | | 0.1313 | 0.6078 | 815 | 1.1231 | 44058296 | | 0.1916 | 0.6115 | 820 | 1.1199 | 44323824 | | 0.1367 | 0.6152 | 825 | 1.1183 | 44600736 | | 0.1064 | 0.6190 | 830 | 1.1228 | 44871640 | | 0.0885 | 0.6227 | 835 | 1.1214 | 45136648 | | 0.1405 | 0.6264 | 840 | 1.1183 | 45412936 | | 0.1229 | 0.6302 | 845 | 1.1195 | 45676832 | | 0.1544 | 0.6339 | 850 | 1.1204 | 45954952 | | 0.1298 | 0.6376 | 855 | 1.1215 | 46232064 | | 0.207 | 0.6413 | 860 | 1.1232 | 46500536 | | 0.1036 | 0.6451 | 865 | 1.1216 | 46768904 | | 0.1644 | 0.6488 | 870 | 1.1206 | 47038312 | | 0.1903 | 0.6525 | 875 | 1.1191 | 47305784 | | 0.1797 | 0.6563 | 880 | 1.1197 | 47567088 | | 0.1451 | 0.6600 | 885 | 1.1186 | 47835208 | | 0.1295 | 0.6637 | 890 | 1.1170 | 48110200 | | 0.0897 | 0.6674 | 895 | 1.1182 | 48387944 | | 0.1365 | 0.6712 | 900 | 1.1182 | 48654496 | | 0.1166 | 0.6749 | 905 | 1.1168 | 48925776 | | 0.1172 | 0.6786 | 910 | 1.1220 | 49198040 | | 0.1452 | 0.6824 | 915 | 1.1210 | 49470608 | | 0.1495 | 0.6861 | 920 | 1.1190 | 49741056 | | 0.113 | 0.6898 | 925 | 1.1189 | 50014608 | | 0.1343 | 0.6935 | 930 | 1.1204 | 50288136 | | 0.1857 | 0.6973 | 935 | 1.1175 | 50558880 | | 0.1177 | 0.7010 | 940 | 1.1170 | 50828624 | | 0.169 | 0.7047 | 945 | 1.1168 | 51102088 | | 0.2074 | 0.7085 | 950 | 1.1151 | 51369824 | | 0.1161 | 0.7122 | 955 | 1.1168 | 51641024 | | 0.1411 | 0.7159 | 960 | 1.1170 | 51909240 | | 0.1514 | 0.7196 | 965 | 1.1158 | 52177496 | | 0.1911 | 0.7234 | 970 | 1.1176 | 52450824 | | 0.163 | 0.7271 | 975 | 1.1162 | 52729824 | | 0.0962 | 0.7308 | 980 | 1.1152 | 52995240 | | 0.1413 | 0.7346 | 985 | 1.1180 | 53263416 | | 0.2341 | 0.7383 | 990 | 1.1176 | 53527672 | | 0.109 | 0.7420 | 995 | 1.1147 | 53793040 | | 0.1362 | 0.7457 | 1000 | 1.1141 | 54066168 | | 0.1523 | 0.7495 | 1005 | 1.1145 | 54337184 | | 0.1541 | 0.7532 | 1010 | 1.1154 | 54613256 | | 0.1942 | 0.7569 | 1015 | 1.1168 | 54884736 | | 0.1567 | 0.7607 | 1020 | 1.1169 | 55156512 | | 0.1341 | 0.7644 | 1025 | 1.1186 | 55429832 | | 0.0783 | 0.7681 | 1030 | 1.1167 | 55703192 | | 0.1526 | 0.7718 | 1035 | 1.1157 | 55978080 | | 0.201 | 0.7756 | 1040 | 1.1135 | 56246208 | | 0.1721 | 0.7793 | 1045 | 1.1119 | 56518304 | | 0.1958 | 0.7830 | 1050 | 1.1158 | 56786584 | | 0.1789 | 0.7868 | 1055 | 1.1182 | 57058752 | | 0.1706 | 0.7905 | 1060 | 1.1138 | 57340616 | | 0.1119 | 0.7942 | 1065 | 1.1121 | 57618032 | | 0.1033 | 0.7979 | 1070 | 1.1150 | 57890944 | | 0.0648 | 0.8017 | 1075 | 1.1166 | 58157784 | | 0.1655 | 0.8054 | 1080 | 1.1131 | 58428976 | | 0.1665 | 0.8091 | 1085 | 1.1122 | 58700640 | | 0.245 | 0.8129 | 1090 | 1.1139 | 58964952 | | 0.0995 | 0.8166 | 1095 | 1.1137 | 59233864 | | 0.1095 | 0.8203 | 1100 | 1.1134 | 59502808 | | 0.1329 | 0.8241 | 1105 | 1.1133 | 59775576 | | 0.2066 | 0.8278 | 1110 | 1.1127 | 60051688 | | 0.0901 | 0.8315 | 1115 | 1.1136 | 60315488 | | 0.1157 | 0.8352 | 1120 | 1.1137 | 60590808 | | 0.178 | 0.8390 | 1125 | 1.1135 | 60866712 | | 0.1368 | 0.8427 | 1130 | 1.1139 | 61137936 | | 0.1683 | 0.8464 | 1135 | 1.1148 | 61405640 | | 0.193 | 0.8502 | 1140 | 1.1094 | 61679192 | | 0.0919 | 0.8539 | 1145 | 1.1099 | 61950280 | | 0.1054 | 0.8576 | 1150 | 1.1116 | 62221520 | | 0.1405 | 0.8613 | 1155 | 1.1089 | 62492616 | | 0.2065 | 0.8651 | 1160 | 1.1088 | 62768672 | | 0.0888 | 0.8688 | 1165 | 1.1109 | 63034280 | | 0.107 | 0.8725 | 1170 | 1.1133 | 63305184 | | 0.1131 | 0.8763 | 1175 | 1.1138 | 63570736 | | 0.154 | 0.8800 | 1180 | 1.1125 | 63839160 | | 0.2166 | 0.8837 | 1185 | 1.1128 | 64107448 | | 0.17 | 0.8874 | 1190 | 1.1112 | 64384992 | | 0.097 | 0.8912 | 1195 | 1.1101 | 64654992 | | 0.1523 | 0.8949 | 1200 | 1.1113 | 64923728 | | 0.1752 | 0.8986 | 1205 | 1.1110 | 65194568 | | 0.1477 | 0.9024 | 1210 | 1.1107 | 65468632 | | 0.124 | 0.9061 | 1215 | 1.1104 | 65732912 | | 0.1321 | 0.9098 | 1220 | 1.1107 | 65998584 | | 0.1027 | 0.9135 | 1225 | 1.1109 | 66275336 | | 0.1562 | 0.9173 | 1230 | 1.1131 | 66549384 | | 0.1955 | 0.9210 | 1235 | 1.1105 | 66817160 | | 0.1341 | 0.9247 | 1240 | 1.1086 | 67076696 | | 0.1253 | 0.9285 | 1245 | 1.1099 | 67347784 | | 0.2128 | 0.9322 | 1250 | 1.1119 | 67614184 | | 0.1334 | 0.9359 | 1255 | 1.1100 | 67883936 | | 0.1227 | 0.9396 | 1260 | 1.1085 | 68159848 | | 0.1073 | 0.9434 | 1265 | 1.1110 | 68434512 | | 0.126 | 0.9471 | 1270 | 1.1105 | 68701016 | | 0.1085 | 0.9508 | 1275 | 1.1112 | 68971216 | | 0.1942 | 0.9546 | 1280 | 1.1079 | 69243880 | | 0.1107 | 0.9583 | 1285 | 1.1082 | 69513872 | | 0.1296 | 0.9620 | 1290 | 1.1091 | 69781928 | | 0.1981 | 0.9657 | 1295 | 1.1087 | 70046808 | | 0.2142 | 0.9695 | 1300 | 1.1073 | 70319584 | | 0.145 | 0.9732 | 1305 | 1.1094 | 70592160 | | 0.2102 | 0.9769 | 1310 | 1.1095 | 70861072 | | 0.1017 | 0.9807 | 1315 | 1.1088 | 71127088 | | 0.1419 | 0.9844 | 1320 | 1.1090 | 71394680 | | 0.1959 | 0.9881 | 1325 | 1.1061 | 71667208 | | 0.205 | 0.9918 | 1330 | 1.1043 | 71935800 | | 0.1699 | 0.9956 | 1335 | 1.1050 | 72203200 | | 0.1449 | 0.9993 | 1340 | 1.1072 | 72483520 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1