collapse_gemma-2-2b_hs2_accumulate_iter14_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1072
- Num Input Tokens Seen: 72483520
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6216 | 0.0037 | 5 | 1.3897 | 273928 |
1.5677 | 0.0075 | 10 | 1.3772 | 540496 |
1.5223 | 0.0112 | 15 | 1.3459 | 806024 |
1.4364 | 0.0149 | 20 | 1.2958 | 1080064 |
1.3754 | 0.0186 | 25 | 1.2548 | 1342696 |
1.341 | 0.0224 | 30 | 1.2272 | 1609776 |
1.2543 | 0.0261 | 35 | 1.1938 | 1883584 |
1.1445 | 0.0298 | 40 | 1.1967 | 2159576 |
1.0439 | 0.0336 | 45 | 1.2223 | 2427016 |
0.9516 | 0.0373 | 50 | 1.2217 | 2699496 |
0.8087 | 0.0410 | 55 | 1.2222 | 2961896 |
0.6456 | 0.0447 | 60 | 1.2632 | 3230456 |
0.5882 | 0.0485 | 65 | 1.2657 | 3495256 |
0.54 | 0.0522 | 70 | 1.2746 | 3766984 |
0.4778 | 0.0559 | 75 | 1.2396 | 4040984 |
0.4138 | 0.0597 | 80 | 1.2546 | 4317400 |
0.3985 | 0.0634 | 85 | 1.2427 | 4587088 |
0.4226 | 0.0671 | 90 | 1.2237 | 4851192 |
0.335 | 0.0708 | 95 | 1.2147 | 5128392 |
0.3381 | 0.0746 | 100 | 1.2222 | 5402272 |
0.2826 | 0.0783 | 105 | 1.2087 | 5672736 |
0.3858 | 0.0820 | 110 | 1.2034 | 5945640 |
0.2916 | 0.0858 | 115 | 1.2149 | 6213712 |
0.1812 | 0.0895 | 120 | 1.2091 | 6480944 |
0.2106 | 0.0932 | 125 | 1.2118 | 6749984 |
0.2204 | 0.0969 | 130 | 1.2141 | 7022400 |
0.26 | 0.1007 | 135 | 1.2027 | 7290400 |
0.2057 | 0.1044 | 140 | 1.1904 | 7559840 |
0.1634 | 0.1081 | 145 | 1.2047 | 7828648 |
0.2798 | 0.1119 | 150 | 1.1945 | 8103064 |
0.218 | 0.1156 | 155 | 1.1967 | 8374240 |
0.2143 | 0.1193 | 160 | 1.1997 | 8649728 |
0.282 | 0.1230 | 165 | 1.1903 | 8915992 |
0.2373 | 0.1268 | 170 | 1.1923 | 9179120 |
0.186 | 0.1305 | 175 | 1.1857 | 9444544 |
0.2151 | 0.1342 | 180 | 1.1888 | 9719840 |
0.1982 | 0.1380 | 185 | 1.1860 | 9987088 |
0.2093 | 0.1417 | 190 | 1.1933 | 10260608 |
0.1927 | 0.1454 | 195 | 1.1829 | 10531552 |
0.2653 | 0.1491 | 200 | 1.1856 | 10803216 |
0.1893 | 0.1529 | 205 | 1.1855 | 11077920 |
0.2772 | 0.1566 | 210 | 1.1858 | 11343864 |
0.2151 | 0.1603 | 215 | 1.1842 | 11616992 |
0.2485 | 0.1641 | 220 | 1.1834 | 11892040 |
0.2226 | 0.1678 | 225 | 1.1787 | 12162184 |
0.1264 | 0.1715 | 230 | 1.1788 | 12429176 |
0.1665 | 0.1753 | 235 | 1.1733 | 12696000 |
0.1108 | 0.1790 | 240 | 1.1739 | 12965168 |
0.185 | 0.1827 | 245 | 1.1671 | 13239112 |
0.2626 | 0.1864 | 250 | 1.1734 | 13514032 |
0.1595 | 0.1902 | 255 | 1.1717 | 13785752 |
0.2451 | 0.1939 | 260 | 1.1687 | 14059520 |
0.2444 | 0.1976 | 265 | 1.1697 | 14324224 |
0.2495 | 0.2014 | 270 | 1.1663 | 14590360 |
0.2167 | 0.2051 | 275 | 1.1653 | 14853096 |
0.1973 | 0.2088 | 280 | 1.1688 | 15122224 |
0.1801 | 0.2125 | 285 | 1.1663 | 15394480 |
0.1666 | 0.2163 | 290 | 1.1666 | 15661024 |
0.1642 | 0.2200 | 295 | 1.1688 | 15928512 |
0.2069 | 0.2237 | 300 | 1.1648 | 16201536 |
0.1672 | 0.2275 | 305 | 1.1624 | 16470776 |
0.1446 | 0.2312 | 310 | 1.1688 | 16744768 |
0.1332 | 0.2349 | 315 | 1.1606 | 17008240 |
0.1447 | 0.2386 | 320 | 1.1595 | 17273856 |
0.1407 | 0.2424 | 325 | 1.1664 | 17549784 |
0.2198 | 0.2461 | 330 | 1.1601 | 17822968 |
0.1968 | 0.2498 | 335 | 1.1568 | 18095368 |
0.1826 | 0.2536 | 340 | 1.1608 | 18371224 |
0.1624 | 0.2573 | 345 | 1.1594 | 18648808 |
0.1164 | 0.2610 | 350 | 1.1552 | 18912232 |
0.1232 | 0.2647 | 355 | 1.1584 | 19180288 |
0.2007 | 0.2685 | 360 | 1.1596 | 19453232 |
0.163 | 0.2722 | 365 | 1.1516 | 19727312 |
0.1141 | 0.2759 | 370 | 1.1587 | 20011560 |
0.1235 | 0.2797 | 375 | 1.1526 | 20283680 |
0.1914 | 0.2834 | 380 | 1.1499 | 20550296 |
0.1682 | 0.2871 | 385 | 1.1512 | 20821616 |
0.1194 | 0.2908 | 390 | 1.1508 | 21095920 |
0.1079 | 0.2946 | 395 | 1.1529 | 21372392 |
0.166 | 0.2983 | 400 | 1.1514 | 21644592 |
0.1262 | 0.3020 | 405 | 1.1497 | 21914152 |
0.1624 | 0.3058 | 410 | 1.1526 | 22186664 |
0.1772 | 0.3095 | 415 | 1.1478 | 22459632 |
0.2304 | 0.3132 | 420 | 1.1476 | 22730888 |
0.0887 | 0.3169 | 425 | 1.1456 | 22992464 |
0.1033 | 0.3207 | 430 | 1.1478 | 23263416 |
0.1526 | 0.3244 | 435 | 1.1456 | 23530720 |
0.1425 | 0.3281 | 440 | 1.1433 | 23792528 |
0.1928 | 0.3319 | 445 | 1.1422 | 24066464 |
0.1651 | 0.3356 | 450 | 1.1433 | 24328952 |
0.1117 | 0.3393 | 455 | 1.1480 | 24589040 |
0.1578 | 0.3430 | 460 | 1.1464 | 24861792 |
0.1554 | 0.3468 | 465 | 1.1408 | 25140824 |
0.1505 | 0.3505 | 470 | 1.1425 | 25408400 |
0.1613 | 0.3542 | 475 | 1.1416 | 25681448 |
0.1858 | 0.3580 | 480 | 1.1394 | 25948192 |
0.1362 | 0.3617 | 485 | 1.1410 | 26216376 |
0.2001 | 0.3654 | 490 | 1.1416 | 26485904 |
0.153 | 0.3691 | 495 | 1.1407 | 26753784 |
0.2446 | 0.3729 | 500 | 1.1432 | 27019552 |
0.1468 | 0.3766 | 505 | 1.1389 | 27293064 |
0.1343 | 0.3803 | 510 | 1.1388 | 27559304 |
0.1486 | 0.3841 | 515 | 1.1379 | 27832208 |
0.1227 | 0.3878 | 520 | 1.1369 | 28099304 |
0.185 | 0.3915 | 525 | 1.1392 | 28366024 |
0.1528 | 0.3952 | 530 | 1.1389 | 28634400 |
0.1835 | 0.3990 | 535 | 1.1360 | 28906280 |
0.1858 | 0.4027 | 540 | 1.1376 | 29174248 |
0.1313 | 0.4064 | 545 | 1.1363 | 29446120 |
0.1405 | 0.4102 | 550 | 1.1334 | 29716480 |
0.1816 | 0.4139 | 555 | 1.1334 | 29985760 |
0.2154 | 0.4176 | 560 | 1.1322 | 30252704 |
0.1683 | 0.4213 | 565 | 1.1311 | 30523920 |
0.1828 | 0.4251 | 570 | 1.1330 | 30795368 |
0.1506 | 0.4288 | 575 | 1.1302 | 31062848 |
0.1773 | 0.4325 | 580 | 1.1313 | 31336984 |
0.1544 | 0.4363 | 585 | 1.1319 | 31611648 |
0.1387 | 0.4400 | 590 | 1.1301 | 31880344 |
0.1977 | 0.4437 | 595 | 1.1292 | 32151312 |
0.1209 | 0.4474 | 600 | 1.1328 | 32418584 |
0.1392 | 0.4512 | 605 | 1.1307 | 32693992 |
0.1996 | 0.4549 | 610 | 1.1291 | 32968608 |
0.2297 | 0.4586 | 615 | 1.1300 | 33237576 |
0.1792 | 0.4624 | 620 | 1.1284 | 33504216 |
0.1289 | 0.4661 | 625 | 1.1281 | 33778712 |
0.2102 | 0.4698 | 630 | 1.1286 | 34048008 |
0.1098 | 0.4735 | 635 | 1.1288 | 34318832 |
0.1766 | 0.4773 | 640 | 1.1280 | 34588104 |
0.1247 | 0.4810 | 645 | 1.1277 | 34863712 |
0.1875 | 0.4847 | 650 | 1.1256 | 35137368 |
0.1388 | 0.4885 | 655 | 1.1274 | 35401856 |
0.1543 | 0.4922 | 660 | 1.1260 | 35669288 |
0.1338 | 0.4959 | 665 | 1.1250 | 35938200 |
0.1478 | 0.4997 | 670 | 1.1261 | 36214008 |
0.078 | 0.5034 | 675 | 1.1283 | 36484712 |
0.1088 | 0.5071 | 680 | 1.1274 | 36756992 |
0.1612 | 0.5108 | 685 | 1.1240 | 37024952 |
0.141 | 0.5146 | 690 | 1.1247 | 37288544 |
0.1367 | 0.5183 | 695 | 1.1265 | 37560624 |
0.158 | 0.5220 | 700 | 1.1268 | 37829640 |
0.1697 | 0.5258 | 705 | 1.1262 | 38100088 |
0.1348 | 0.5295 | 710 | 1.1253 | 38367944 |
0.1406 | 0.5332 | 715 | 1.1238 | 38640424 |
0.1578 | 0.5369 | 720 | 1.1266 | 38907976 |
0.1835 | 0.5407 | 725 | 1.1277 | 39174392 |
0.2109 | 0.5444 | 730 | 1.1236 | 39448744 |
0.1624 | 0.5481 | 735 | 1.1219 | 39721744 |
0.1249 | 0.5519 | 740 | 1.1256 | 39995928 |
0.1682 | 0.5556 | 745 | 1.1246 | 40273496 |
0.1751 | 0.5593 | 750 | 1.1226 | 40547520 |
0.1961 | 0.5630 | 755 | 1.1253 | 40821000 |
0.1429 | 0.5668 | 760 | 1.1276 | 41093648 |
0.1388 | 0.5705 | 765 | 1.1218 | 41362480 |
0.1274 | 0.5742 | 770 | 1.1220 | 41633192 |
0.1763 | 0.5780 | 775 | 1.1238 | 41902024 |
0.1543 | 0.5817 | 780 | 1.1229 | 42171960 |
0.1535 | 0.5854 | 785 | 1.1226 | 42445296 |
0.1456 | 0.5891 | 790 | 1.1235 | 42710176 |
0.0793 | 0.5929 | 795 | 1.1218 | 42982376 |
0.2123 | 0.5966 | 800 | 1.1231 | 43246288 |
0.1695 | 0.6003 | 805 | 1.1223 | 43518184 |
0.1431 | 0.6041 | 810 | 1.1233 | 43787688 |
0.1313 | 0.6078 | 815 | 1.1231 | 44058296 |
0.1916 | 0.6115 | 820 | 1.1199 | 44323824 |
0.1367 | 0.6152 | 825 | 1.1183 | 44600736 |
0.1064 | 0.6190 | 830 | 1.1228 | 44871640 |
0.0885 | 0.6227 | 835 | 1.1214 | 45136648 |
0.1405 | 0.6264 | 840 | 1.1183 | 45412936 |
0.1229 | 0.6302 | 845 | 1.1195 | 45676832 |
0.1544 | 0.6339 | 850 | 1.1204 | 45954952 |
0.1298 | 0.6376 | 855 | 1.1215 | 46232064 |
0.207 | 0.6413 | 860 | 1.1232 | 46500536 |
0.1036 | 0.6451 | 865 | 1.1216 | 46768904 |
0.1644 | 0.6488 | 870 | 1.1206 | 47038312 |
0.1903 | 0.6525 | 875 | 1.1191 | 47305784 |
0.1797 | 0.6563 | 880 | 1.1197 | 47567088 |
0.1451 | 0.6600 | 885 | 1.1186 | 47835208 |
0.1295 | 0.6637 | 890 | 1.1170 | 48110200 |
0.0897 | 0.6674 | 895 | 1.1182 | 48387944 |
0.1365 | 0.6712 | 900 | 1.1182 | 48654496 |
0.1166 | 0.6749 | 905 | 1.1168 | 48925776 |
0.1172 | 0.6786 | 910 | 1.1220 | 49198040 |
0.1452 | 0.6824 | 915 | 1.1210 | 49470608 |
0.1495 | 0.6861 | 920 | 1.1190 | 49741056 |
0.113 | 0.6898 | 925 | 1.1189 | 50014608 |
0.1343 | 0.6935 | 930 | 1.1204 | 50288136 |
0.1857 | 0.6973 | 935 | 1.1175 | 50558880 |
0.1177 | 0.7010 | 940 | 1.1170 | 50828624 |
0.169 | 0.7047 | 945 | 1.1168 | 51102088 |
0.2074 | 0.7085 | 950 | 1.1151 | 51369824 |
0.1161 | 0.7122 | 955 | 1.1168 | 51641024 |
0.1411 | 0.7159 | 960 | 1.1170 | 51909240 |
0.1514 | 0.7196 | 965 | 1.1158 | 52177496 |
0.1911 | 0.7234 | 970 | 1.1176 | 52450824 |
0.163 | 0.7271 | 975 | 1.1162 | 52729824 |
0.0962 | 0.7308 | 980 | 1.1152 | 52995240 |
0.1413 | 0.7346 | 985 | 1.1180 | 53263416 |
0.2341 | 0.7383 | 990 | 1.1176 | 53527672 |
0.109 | 0.7420 | 995 | 1.1147 | 53793040 |
0.1362 | 0.7457 | 1000 | 1.1141 | 54066168 |
0.1523 | 0.7495 | 1005 | 1.1145 | 54337184 |
0.1541 | 0.7532 | 1010 | 1.1154 | 54613256 |
0.1942 | 0.7569 | 1015 | 1.1168 | 54884736 |
0.1567 | 0.7607 | 1020 | 1.1169 | 55156512 |
0.1341 | 0.7644 | 1025 | 1.1186 | 55429832 |
0.0783 | 0.7681 | 1030 | 1.1167 | 55703192 |
0.1526 | 0.7718 | 1035 | 1.1157 | 55978080 |
0.201 | 0.7756 | 1040 | 1.1135 | 56246208 |
0.1721 | 0.7793 | 1045 | 1.1119 | 56518304 |
0.1958 | 0.7830 | 1050 | 1.1158 | 56786584 |
0.1789 | 0.7868 | 1055 | 1.1182 | 57058752 |
0.1706 | 0.7905 | 1060 | 1.1138 | 57340616 |
0.1119 | 0.7942 | 1065 | 1.1121 | 57618032 |
0.1033 | 0.7979 | 1070 | 1.1150 | 57890944 |
0.0648 | 0.8017 | 1075 | 1.1166 | 58157784 |
0.1655 | 0.8054 | 1080 | 1.1131 | 58428976 |
0.1665 | 0.8091 | 1085 | 1.1122 | 58700640 |
0.245 | 0.8129 | 1090 | 1.1139 | 58964952 |
0.0995 | 0.8166 | 1095 | 1.1137 | 59233864 |
0.1095 | 0.8203 | 1100 | 1.1134 | 59502808 |
0.1329 | 0.8241 | 1105 | 1.1133 | 59775576 |
0.2066 | 0.8278 | 1110 | 1.1127 | 60051688 |
0.0901 | 0.8315 | 1115 | 1.1136 | 60315488 |
0.1157 | 0.8352 | 1120 | 1.1137 | 60590808 |
0.178 | 0.8390 | 1125 | 1.1135 | 60866712 |
0.1368 | 0.8427 | 1130 | 1.1139 | 61137936 |
0.1683 | 0.8464 | 1135 | 1.1148 | 61405640 |
0.193 | 0.8502 | 1140 | 1.1094 | 61679192 |
0.0919 | 0.8539 | 1145 | 1.1099 | 61950280 |
0.1054 | 0.8576 | 1150 | 1.1116 | 62221520 |
0.1405 | 0.8613 | 1155 | 1.1089 | 62492616 |
0.2065 | 0.8651 | 1160 | 1.1088 | 62768672 |
0.0888 | 0.8688 | 1165 | 1.1109 | 63034280 |
0.107 | 0.8725 | 1170 | 1.1133 | 63305184 |
0.1131 | 0.8763 | 1175 | 1.1138 | 63570736 |
0.154 | 0.8800 | 1180 | 1.1125 | 63839160 |
0.2166 | 0.8837 | 1185 | 1.1128 | 64107448 |
0.17 | 0.8874 | 1190 | 1.1112 | 64384992 |
0.097 | 0.8912 | 1195 | 1.1101 | 64654992 |
0.1523 | 0.8949 | 1200 | 1.1113 | 64923728 |
0.1752 | 0.8986 | 1205 | 1.1110 | 65194568 |
0.1477 | 0.9024 | 1210 | 1.1107 | 65468632 |
0.124 | 0.9061 | 1215 | 1.1104 | 65732912 |
0.1321 | 0.9098 | 1220 | 1.1107 | 65998584 |
0.1027 | 0.9135 | 1225 | 1.1109 | 66275336 |
0.1562 | 0.9173 | 1230 | 1.1131 | 66549384 |
0.1955 | 0.9210 | 1235 | 1.1105 | 66817160 |
0.1341 | 0.9247 | 1240 | 1.1086 | 67076696 |
0.1253 | 0.9285 | 1245 | 1.1099 | 67347784 |
0.2128 | 0.9322 | 1250 | 1.1119 | 67614184 |
0.1334 | 0.9359 | 1255 | 1.1100 | 67883936 |
0.1227 | 0.9396 | 1260 | 1.1085 | 68159848 |
0.1073 | 0.9434 | 1265 | 1.1110 | 68434512 |
0.126 | 0.9471 | 1270 | 1.1105 | 68701016 |
0.1085 | 0.9508 | 1275 | 1.1112 | 68971216 |
0.1942 | 0.9546 | 1280 | 1.1079 | 69243880 |
0.1107 | 0.9583 | 1285 | 1.1082 | 69513872 |
0.1296 | 0.9620 | 1290 | 1.1091 | 69781928 |
0.1981 | 0.9657 | 1295 | 1.1087 | 70046808 |
0.2142 | 0.9695 | 1300 | 1.1073 | 70319584 |
0.145 | 0.9732 | 1305 | 1.1094 | 70592160 |
0.2102 | 0.9769 | 1310 | 1.1095 | 70861072 |
0.1017 | 0.9807 | 1315 | 1.1088 | 71127088 |
0.1419 | 0.9844 | 1320 | 1.1090 | 71394680 |
0.1959 | 0.9881 | 1325 | 1.1061 | 71667208 |
0.205 | 0.9918 | 1330 | 1.1043 | 71935800 |
0.1699 | 0.9956 | 1335 | 1.1050 | 72203200 |
0.1449 | 0.9993 | 1340 | 1.1072 | 72483520 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter14_sftsd2
Base model
google/gemma-2-2b