collapse_gemma-2-2b_hs2_accumulate_iter15_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0984
- Num Input Tokens Seen: 76782856
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6453 | 0.0035 | 5 | 1.3906 | 259240 |
1.5833 | 0.0069 | 10 | 1.3797 | 529800 |
1.6978 | 0.0104 | 15 | 1.3496 | 797808 |
1.5875 | 0.0139 | 20 | 1.3053 | 1060152 |
1.3919 | 0.0174 | 25 | 1.2601 | 1327568 |
1.3483 | 0.0208 | 30 | 1.2284 | 1596672 |
1.2488 | 0.0243 | 35 | 1.1949 | 1865008 |
1.1389 | 0.0278 | 40 | 1.1896 | 2131824 |
1.0689 | 0.0313 | 45 | 1.2223 | 2398440 |
0.9179 | 0.0347 | 50 | 1.2383 | 2669800 |
0.7536 | 0.0382 | 55 | 1.2621 | 2942936 |
0.6764 | 0.0417 | 60 | 1.3037 | 3212096 |
0.5676 | 0.0452 | 65 | 1.3123 | 3484464 |
0.5606 | 0.0486 | 70 | 1.2797 | 3745816 |
0.3639 | 0.0521 | 75 | 1.2963 | 4017496 |
0.3165 | 0.0556 | 80 | 1.2406 | 4292208 |
0.2976 | 0.0591 | 85 | 1.2438 | 4561360 |
0.2959 | 0.0625 | 90 | 1.2285 | 4822824 |
0.2696 | 0.0660 | 95 | 1.2187 | 5093680 |
0.2144 | 0.0695 | 100 | 1.2105 | 5358792 |
0.2618 | 0.0730 | 105 | 1.2045 | 5620144 |
0.385 | 0.0764 | 110 | 1.2117 | 5894904 |
0.1695 | 0.0799 | 115 | 1.2058 | 6159632 |
0.1663 | 0.0834 | 120 | 1.1961 | 6428520 |
0.1969 | 0.0869 | 125 | 1.1889 | 6695048 |
0.1977 | 0.0903 | 130 | 1.1924 | 6962208 |
0.2226 | 0.0938 | 135 | 1.1895 | 7232176 |
0.2204 | 0.0973 | 140 | 1.1913 | 7501408 |
0.2473 | 0.1008 | 145 | 1.1866 | 7763208 |
0.1407 | 0.1042 | 150 | 1.1842 | 8033336 |
0.2291 | 0.1077 | 155 | 1.1815 | 8300520 |
0.2194 | 0.1112 | 160 | 1.1806 | 8568256 |
0.1911 | 0.1146 | 165 | 1.1818 | 8836048 |
0.1388 | 0.1181 | 170 | 1.1802 | 9113320 |
0.1651 | 0.1216 | 175 | 1.1741 | 9377776 |
0.1803 | 0.1251 | 180 | 1.1778 | 9647312 |
0.2031 | 0.1285 | 185 | 1.1709 | 9908016 |
0.2055 | 0.1320 | 190 | 1.1682 | 10176520 |
0.1837 | 0.1355 | 195 | 1.1685 | 10439024 |
0.0697 | 0.1390 | 200 | 1.1730 | 10706280 |
0.1426 | 0.1424 | 205 | 1.1754 | 10977536 |
0.1801 | 0.1459 | 210 | 1.1686 | 11249016 |
0.2272 | 0.1494 | 215 | 1.1702 | 11517184 |
0.2112 | 0.1529 | 220 | 1.1613 | 11787544 |
0.0994 | 0.1563 | 225 | 1.1652 | 12056344 |
0.2244 | 0.1598 | 230 | 1.1607 | 12320072 |
0.1794 | 0.1633 | 235 | 1.1616 | 12587592 |
0.1554 | 0.1668 | 240 | 1.1662 | 12849800 |
0.1929 | 0.1702 | 245 | 1.1588 | 13119568 |
0.1799 | 0.1737 | 250 | 1.1582 | 13384248 |
0.1797 | 0.1772 | 255 | 1.1619 | 13651872 |
0.1775 | 0.1807 | 260 | 1.1551 | 13916536 |
0.2294 | 0.1841 | 265 | 1.1609 | 14186752 |
0.145 | 0.1876 | 270 | 1.1570 | 14455592 |
0.175 | 0.1911 | 275 | 1.1573 | 14728760 |
0.1398 | 0.1946 | 280 | 1.1557 | 14989992 |
0.1515 | 0.1980 | 285 | 1.1542 | 15252080 |
0.194 | 0.2015 | 290 | 1.1544 | 15524072 |
0.2019 | 0.2050 | 295 | 1.1529 | 15789008 |
0.1781 | 0.2085 | 300 | 1.1523 | 16058360 |
0.1474 | 0.2119 | 305 | 1.1588 | 16331456 |
0.1502 | 0.2154 | 310 | 1.1499 | 16600584 |
0.1529 | 0.2189 | 315 | 1.1487 | 16877288 |
0.2112 | 0.2223 | 320 | 1.1512 | 17141648 |
0.1404 | 0.2258 | 325 | 1.1511 | 17411552 |
0.1322 | 0.2293 | 330 | 1.1480 | 17676088 |
0.1329 | 0.2328 | 335 | 1.1450 | 17944616 |
0.132 | 0.2362 | 340 | 1.1507 | 18214264 |
0.1692 | 0.2397 | 345 | 1.1470 | 18482320 |
0.1344 | 0.2432 | 350 | 1.1468 | 18752728 |
0.1738 | 0.2467 | 355 | 1.1474 | 19013392 |
0.1838 | 0.2501 | 360 | 1.1447 | 19276112 |
0.1396 | 0.2536 | 365 | 1.1491 | 19541200 |
0.1596 | 0.2571 | 370 | 1.1440 | 19813544 |
0.1257 | 0.2606 | 375 | 1.1430 | 20070744 |
0.1421 | 0.2640 | 380 | 1.1422 | 20339256 |
0.1725 | 0.2675 | 385 | 1.1456 | 20605512 |
0.1461 | 0.2710 | 390 | 1.1406 | 20863560 |
0.1689 | 0.2745 | 395 | 1.1434 | 21124400 |
0.1478 | 0.2779 | 400 | 1.1411 | 21384032 |
0.2111 | 0.2814 | 405 | 1.1449 | 21643360 |
0.1655 | 0.2849 | 410 | 1.1392 | 21910168 |
0.1004 | 0.2884 | 415 | 1.1371 | 22172528 |
0.2003 | 0.2918 | 420 | 1.1415 | 22434704 |
0.1566 | 0.2953 | 425 | 1.1368 | 22709336 |
0.1888 | 0.2988 | 430 | 1.1364 | 22977120 |
0.1835 | 0.3023 | 435 | 1.1374 | 23242112 |
0.1069 | 0.3057 | 440 | 1.1388 | 23498664 |
0.1634 | 0.3092 | 445 | 1.1361 | 23769760 |
0.0829 | 0.3127 | 450 | 1.1359 | 24035360 |
0.0695 | 0.3162 | 455 | 1.1374 | 24305936 |
0.1527 | 0.3196 | 460 | 1.1377 | 24572752 |
0.0992 | 0.3231 | 465 | 1.1337 | 24842184 |
0.1155 | 0.3266 | 470 | 1.1336 | 25097912 |
0.1488 | 0.3300 | 475 | 1.1344 | 25369768 |
0.1626 | 0.3335 | 480 | 1.1320 | 25639864 |
0.1399 | 0.3370 | 485 | 1.1335 | 25904256 |
0.2143 | 0.3405 | 490 | 1.1346 | 26169136 |
0.2551 | 0.3439 | 495 | 1.1348 | 26436600 |
0.1078 | 0.3474 | 500 | 1.1312 | 26696080 |
0.1527 | 0.3509 | 505 | 1.1333 | 26961208 |
0.1384 | 0.3544 | 510 | 1.1309 | 27219168 |
0.1126 | 0.3578 | 515 | 1.1301 | 27480744 |
0.1966 | 0.3613 | 520 | 1.1290 | 27745176 |
0.1169 | 0.3648 | 525 | 1.1281 | 28009360 |
0.1249 | 0.3683 | 530 | 1.1275 | 28280216 |
0.1873 | 0.3717 | 535 | 1.1303 | 28547240 |
0.1156 | 0.3752 | 540 | 1.1302 | 28815640 |
0.1723 | 0.3787 | 545 | 1.1275 | 29085352 |
0.1123 | 0.3822 | 550 | 1.1279 | 29351288 |
0.1303 | 0.3856 | 555 | 1.1308 | 29618680 |
0.1672 | 0.3891 | 560 | 1.1296 | 29885856 |
0.159 | 0.3926 | 565 | 1.1259 | 30151680 |
0.1476 | 0.3961 | 570 | 1.1267 | 30411912 |
0.1154 | 0.3995 | 575 | 1.1286 | 30674296 |
0.0964 | 0.4030 | 580 | 1.1273 | 30940496 |
0.1529 | 0.4065 | 585 | 1.1243 | 31206600 |
0.0861 | 0.4100 | 590 | 1.1237 | 31475320 |
0.1007 | 0.4134 | 595 | 1.1272 | 31748912 |
0.1689 | 0.4169 | 600 | 1.1244 | 32010968 |
0.1012 | 0.4204 | 605 | 1.1212 | 32285720 |
0.1835 | 0.4239 | 610 | 1.1248 | 32551880 |
0.172 | 0.4273 | 615 | 1.1248 | 32811184 |
0.113 | 0.4308 | 620 | 1.1232 | 33077144 |
0.1075 | 0.4343 | 625 | 1.1216 | 33347384 |
0.2072 | 0.4377 | 630 | 1.1224 | 33617960 |
0.1484 | 0.4412 | 635 | 1.1210 | 33878560 |
0.1226 | 0.4447 | 640 | 1.1214 | 34139352 |
0.1424 | 0.4482 | 645 | 1.1211 | 34405416 |
0.1684 | 0.4516 | 650 | 1.1234 | 34675056 |
0.1741 | 0.4551 | 655 | 1.1223 | 34952496 |
0.1605 | 0.4586 | 660 | 1.1187 | 35212824 |
0.1227 | 0.4621 | 665 | 1.1207 | 35475200 |
0.1021 | 0.4655 | 670 | 1.1237 | 35736896 |
0.1438 | 0.4690 | 675 | 1.1198 | 36015768 |
0.1151 | 0.4725 | 680 | 1.1201 | 36284160 |
0.1057 | 0.4760 | 685 | 1.1215 | 36552368 |
0.145 | 0.4794 | 690 | 1.1193 | 36815240 |
0.1695 | 0.4829 | 695 | 1.1181 | 37075072 |
0.1017 | 0.4864 | 700 | 1.1193 | 37340672 |
0.117 | 0.4899 | 705 | 1.1208 | 37611736 |
0.1332 | 0.4933 | 710 | 1.1193 | 37875664 |
0.1019 | 0.4968 | 715 | 1.1176 | 38142808 |
0.1125 | 0.5003 | 720 | 1.1209 | 38412576 |
0.1838 | 0.5038 | 725 | 1.1199 | 38686288 |
0.1261 | 0.5072 | 730 | 1.1183 | 38951240 |
0.1631 | 0.5107 | 735 | 1.1163 | 39223568 |
0.1148 | 0.5142 | 740 | 1.1167 | 39500312 |
0.1167 | 0.5177 | 745 | 1.1173 | 39763128 |
0.1754 | 0.5211 | 750 | 1.1177 | 40026280 |
0.0911 | 0.5246 | 755 | 1.1157 | 40299288 |
0.1737 | 0.5281 | 760 | 1.1152 | 40560008 |
0.1902 | 0.5315 | 765 | 1.1168 | 40826504 |
0.0982 | 0.5350 | 770 | 1.1155 | 41095488 |
0.1197 | 0.5385 | 775 | 1.1170 | 41364376 |
0.1502 | 0.5420 | 780 | 1.1170 | 41637520 |
0.14 | 0.5454 | 785 | 1.1154 | 41900864 |
0.169 | 0.5489 | 790 | 1.1140 | 42170832 |
0.1243 | 0.5524 | 795 | 1.1151 | 42436808 |
0.1312 | 0.5559 | 800 | 1.1168 | 42708488 |
0.0869 | 0.5593 | 805 | 1.1147 | 42977088 |
0.1679 | 0.5628 | 810 | 1.1142 | 43242944 |
0.1334 | 0.5663 | 815 | 1.1151 | 43505280 |
0.1058 | 0.5698 | 820 | 1.1139 | 43771480 |
0.0766 | 0.5732 | 825 | 1.1133 | 44036256 |
0.1414 | 0.5767 | 830 | 1.1147 | 44302448 |
0.1097 | 0.5802 | 835 | 1.1141 | 44565816 |
0.15 | 0.5837 | 840 | 1.1153 | 44830288 |
0.1069 | 0.5871 | 845 | 1.1133 | 45094712 |
0.1397 | 0.5906 | 850 | 1.1148 | 45364736 |
0.1168 | 0.5941 | 855 | 1.1148 | 45629184 |
0.1643 | 0.5976 | 860 | 1.1118 | 45894200 |
0.1411 | 0.6010 | 865 | 1.1126 | 46163032 |
0.1426 | 0.6045 | 870 | 1.1132 | 46420792 |
0.1124 | 0.6080 | 875 | 1.1125 | 46689776 |
0.1665 | 0.6115 | 880 | 1.1134 | 46962560 |
0.1294 | 0.6149 | 885 | 1.1120 | 47224120 |
0.0987 | 0.6184 | 890 | 1.1098 | 47488104 |
0.1803 | 0.6219 | 895 | 1.1123 | 47761664 |
0.1065 | 0.6254 | 900 | 1.1133 | 48025424 |
0.1266 | 0.6288 | 905 | 1.1112 | 48295024 |
0.1368 | 0.6323 | 910 | 1.1097 | 48562064 |
0.1187 | 0.6358 | 915 | 1.1107 | 48834312 |
0.1107 | 0.6392 | 920 | 1.1098 | 49104304 |
0.0922 | 0.6427 | 925 | 1.1109 | 49364480 |
0.1475 | 0.6462 | 930 | 1.1108 | 49633208 |
0.1359 | 0.6497 | 935 | 1.1107 | 49900224 |
0.1054 | 0.6531 | 940 | 1.1103 | 50162168 |
0.1633 | 0.6566 | 945 | 1.1093 | 50424792 |
0.1458 | 0.6601 | 950 | 1.1089 | 50690664 |
0.1463 | 0.6636 | 955 | 1.1095 | 50960320 |
0.1265 | 0.6670 | 960 | 1.1102 | 51224304 |
0.1101 | 0.6705 | 965 | 1.1097 | 51493008 |
0.1343 | 0.6740 | 970 | 1.1084 | 51765072 |
0.15 | 0.6775 | 975 | 1.1090 | 52028408 |
0.087 | 0.6809 | 980 | 1.1098 | 52299504 |
0.0849 | 0.6844 | 985 | 1.1112 | 52561968 |
0.182 | 0.6879 | 990 | 1.1090 | 52842040 |
0.1282 | 0.6914 | 995 | 1.1067 | 53115192 |
0.2152 | 0.6948 | 1000 | 1.1083 | 53380920 |
0.1491 | 0.6983 | 1005 | 1.1085 | 53652824 |
0.1541 | 0.7018 | 1010 | 1.1057 | 53918616 |
0.1696 | 0.7053 | 1015 | 1.1087 | 54185648 |
0.1031 | 0.7087 | 1020 | 1.1117 | 54451312 |
0.1274 | 0.7122 | 1025 | 1.1091 | 54717184 |
0.1489 | 0.7157 | 1030 | 1.1071 | 54982624 |
0.1583 | 0.7192 | 1035 | 1.1072 | 55243168 |
0.1219 | 0.7226 | 1040 | 1.1080 | 55510800 |
0.1996 | 0.7261 | 1045 | 1.1059 | 55775848 |
0.143 | 0.7296 | 1050 | 1.1049 | 56045256 |
0.1711 | 0.7331 | 1055 | 1.1059 | 56304240 |
0.1012 | 0.7365 | 1060 | 1.1081 | 56573776 |
0.2096 | 0.7400 | 1065 | 1.1080 | 56836232 |
0.1923 | 0.7435 | 1070 | 1.1069 | 57101816 |
0.1474 | 0.7469 | 1075 | 1.1060 | 57371896 |
0.1008 | 0.7504 | 1080 | 1.1055 | 57630288 |
0.1554 | 0.7539 | 1085 | 1.1066 | 57893096 |
0.1386 | 0.7574 | 1090 | 1.1042 | 58163448 |
0.172 | 0.7608 | 1095 | 1.1038 | 58433704 |
0.1123 | 0.7643 | 1100 | 1.1043 | 58694928 |
0.1065 | 0.7678 | 1105 | 1.1057 | 58960912 |
0.1274 | 0.7713 | 1110 | 1.1052 | 59231656 |
0.1784 | 0.7747 | 1115 | 1.1041 | 59498952 |
0.1118 | 0.7782 | 1120 | 1.1050 | 59766192 |
0.0999 | 0.7817 | 1125 | 1.1070 | 60034984 |
0.1701 | 0.7852 | 1130 | 1.1068 | 60298864 |
0.1304 | 0.7886 | 1135 | 1.1046 | 60554536 |
0.1136 | 0.7921 | 1140 | 1.1044 | 60825952 |
0.1477 | 0.7956 | 1145 | 1.1059 | 61093632 |
0.1574 | 0.7991 | 1150 | 1.1040 | 61354432 |
0.1067 | 0.8025 | 1155 | 1.1033 | 61629080 |
0.1084 | 0.8060 | 1160 | 1.1041 | 61888184 |
0.1059 | 0.8095 | 1165 | 1.1044 | 62157080 |
0.0936 | 0.8130 | 1170 | 1.1043 | 62428792 |
0.1162 | 0.8164 | 1175 | 1.1048 | 62696264 |
0.1279 | 0.8199 | 1180 | 1.1050 | 62965832 |
0.1186 | 0.8234 | 1185 | 1.1049 | 63235528 |
0.1847 | 0.8269 | 1190 | 1.1028 | 63503776 |
0.1189 | 0.8303 | 1195 | 1.1030 | 63777592 |
0.1636 | 0.8338 | 1200 | 1.1044 | 64042352 |
0.1422 | 0.8373 | 1205 | 1.1037 | 64306016 |
0.1387 | 0.8408 | 1210 | 1.1016 | 64566472 |
0.1087 | 0.8442 | 1215 | 1.1013 | 64833320 |
0.1555 | 0.8477 | 1220 | 1.1044 | 65098616 |
0.132 | 0.8512 | 1225 | 1.1043 | 65367832 |
0.1313 | 0.8546 | 1230 | 1.1020 | 65636640 |
0.1466 | 0.8581 | 1235 | 1.1021 | 65900088 |
0.1255 | 0.8616 | 1240 | 1.1014 | 66162344 |
0.0911 | 0.8651 | 1245 | 1.1022 | 66429904 |
0.2198 | 0.8685 | 1250 | 1.1034 | 66695232 |
0.1598 | 0.8720 | 1255 | 1.1013 | 66968480 |
0.1646 | 0.8755 | 1260 | 1.1021 | 67233232 |
0.1366 | 0.8790 | 1265 | 1.1022 | 67504992 |
0.1414 | 0.8824 | 1270 | 1.1005 | 67770472 |
0.083 | 0.8859 | 1275 | 1.1003 | 68040968 |
0.1556 | 0.8894 | 1280 | 1.1020 | 68317000 |
0.108 | 0.8929 | 1285 | 1.1015 | 68579544 |
0.1578 | 0.8963 | 1290 | 1.1030 | 68847648 |
0.1366 | 0.8998 | 1295 | 1.1031 | 69114512 |
0.128 | 0.9033 | 1300 | 1.1004 | 69381320 |
0.0771 | 0.9068 | 1305 | 1.0987 | 69643160 |
0.112 | 0.9102 | 1310 | 1.1014 | 69904128 |
0.133 | 0.9137 | 1315 | 1.1014 | 70170896 |
0.1518 | 0.9172 | 1320 | 1.1006 | 70440888 |
0.1505 | 0.9207 | 1325 | 1.1015 | 70703464 |
0.136 | 0.9241 | 1330 | 1.1011 | 70968968 |
0.1939 | 0.9276 | 1335 | 1.0989 | 71236808 |
0.1456 | 0.9311 | 1340 | 1.0991 | 71504536 |
0.177 | 0.9346 | 1345 | 1.0995 | 71769160 |
0.0874 | 0.9380 | 1350 | 1.0995 | 72033360 |
0.1561 | 0.9415 | 1355 | 1.0997 | 72301896 |
0.1228 | 0.9450 | 1360 | 1.0997 | 72561256 |
0.2018 | 0.9485 | 1365 | 1.0993 | 72841832 |
0.1272 | 0.9519 | 1370 | 1.1000 | 73103080 |
0.1272 | 0.9554 | 1375 | 1.1011 | 73365168 |
0.1899 | 0.9589 | 1380 | 1.0984 | 73632632 |
0.1296 | 0.9623 | 1385 | 1.0982 | 73904920 |
0.1082 | 0.9658 | 1390 | 1.0994 | 74170664 |
0.1441 | 0.9693 | 1395 | 1.0994 | 74437648 |
0.1612 | 0.9728 | 1400 | 1.0992 | 74706752 |
0.1589 | 0.9762 | 1405 | 1.0998 | 74968824 |
0.1802 | 0.9797 | 1410 | 1.0997 | 75238720 |
0.1315 | 0.9832 | 1415 | 1.0990 | 75502440 |
0.121 | 0.9867 | 1420 | 1.0996 | 75769552 |
0.0828 | 0.9901 | 1425 | 1.1007 | 76032424 |
0.1775 | 0.9936 | 1430 | 1.1000 | 76301624 |
0.1629 | 0.9971 | 1435 | 1.0987 | 76569472 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 7
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter15_sftsd1
Base model
google/gemma-2-2b