collapse_gemma-2-2b_hs2_accumulate_iter13_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0959
- Num Input Tokens Seen: 68049720
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.7246 | 0.0040 | 5 | 1.3893 | 274632 |
1.6077 | 0.0080 | 10 | 1.3737 | 549784 |
1.6709 | 0.0121 | 15 | 1.3418 | 814360 |
1.534 | 0.0161 | 20 | 1.2881 | 1090560 |
1.4434 | 0.0201 | 25 | 1.2479 | 1368768 |
1.335 | 0.0241 | 30 | 1.2198 | 1641632 |
1.2258 | 0.0282 | 35 | 1.1944 | 1918352 |
1.0947 | 0.0322 | 40 | 1.2103 | 2188280 |
0.9936 | 0.0362 | 45 | 1.2261 | 2464288 |
0.8034 | 0.0402 | 50 | 1.2618 | 2739408 |
0.7401 | 0.0442 | 55 | 1.2715 | 3014952 |
0.5482 | 0.0483 | 60 | 1.3062 | 3284832 |
0.6066 | 0.0523 | 65 | 1.3011 | 3550672 |
0.4228 | 0.0563 | 70 | 1.2685 | 3828072 |
0.383 | 0.0603 | 75 | 1.2521 | 4098944 |
0.3031 | 0.0644 | 80 | 1.2432 | 4366976 |
0.2977 | 0.0684 | 85 | 1.2300 | 4645088 |
0.3238 | 0.0724 | 90 | 1.2213 | 4915552 |
0.2242 | 0.0764 | 95 | 1.2228 | 5193504 |
0.2913 | 0.0805 | 100 | 1.2090 | 5460856 |
0.2852 | 0.0845 | 105 | 1.2126 | 5740776 |
0.1977 | 0.0885 | 110 | 1.2046 | 6017688 |
0.1902 | 0.0925 | 115 | 1.2064 | 6288688 |
0.2058 | 0.0965 | 120 | 1.1986 | 6563880 |
0.1855 | 0.1006 | 125 | 1.1871 | 6832008 |
0.263 | 0.1046 | 130 | 1.2067 | 7108368 |
0.2254 | 0.1086 | 135 | 1.1829 | 7383320 |
0.273 | 0.1126 | 140 | 1.1808 | 7656624 |
0.2055 | 0.1167 | 145 | 1.1762 | 7935520 |
0.1285 | 0.1207 | 150 | 1.1747 | 8204880 |
0.2579 | 0.1247 | 155 | 1.1771 | 8478536 |
0.1839 | 0.1287 | 160 | 1.1746 | 8750200 |
0.179 | 0.1327 | 165 | 1.1702 | 9019848 |
0.1875 | 0.1368 | 170 | 1.1689 | 9299096 |
0.2201 | 0.1408 | 175 | 1.1687 | 9564256 |
0.1506 | 0.1448 | 180 | 1.1668 | 9845608 |
0.1286 | 0.1488 | 185 | 1.1686 | 10119032 |
0.2319 | 0.1529 | 190 | 1.1671 | 10398104 |
0.1844 | 0.1569 | 195 | 1.1610 | 10669632 |
0.1351 | 0.1609 | 200 | 1.1689 | 10943176 |
0.218 | 0.1649 | 205 | 1.1609 | 11217176 |
0.1964 | 0.1689 | 210 | 1.1582 | 11495088 |
0.2503 | 0.1730 | 215 | 1.1613 | 11770368 |
0.2185 | 0.1770 | 220 | 1.1618 | 12053496 |
0.1738 | 0.1810 | 225 | 1.1597 | 12326640 |
0.1433 | 0.1850 | 230 | 1.1570 | 12603944 |
0.1784 | 0.1891 | 235 | 1.1557 | 12874520 |
0.1566 | 0.1931 | 240 | 1.1578 | 13145760 |
0.1444 | 0.1971 | 245 | 1.1525 | 13411632 |
0.1471 | 0.2011 | 250 | 1.1521 | 13686744 |
0.1807 | 0.2051 | 255 | 1.1525 | 13957056 |
0.1817 | 0.2092 | 260 | 1.1516 | 14221112 |
0.1582 | 0.2132 | 265 | 1.1514 | 14493784 |
0.2072 | 0.2172 | 270 | 1.1463 | 14777872 |
0.1664 | 0.2212 | 275 | 1.1507 | 15051072 |
0.2101 | 0.2253 | 280 | 1.1498 | 15318864 |
0.1628 | 0.2293 | 285 | 1.1471 | 15591504 |
0.1705 | 0.2333 | 290 | 1.1501 | 15863728 |
0.1955 | 0.2373 | 295 | 1.1445 | 16143816 |
0.1321 | 0.2414 | 300 | 1.1459 | 16417048 |
0.2102 | 0.2454 | 305 | 1.1443 | 16689136 |
0.1241 | 0.2494 | 310 | 1.1441 | 16964768 |
0.2063 | 0.2534 | 315 | 1.1457 | 17235728 |
0.2171 | 0.2574 | 320 | 1.1428 | 17510144 |
0.1589 | 0.2615 | 325 | 1.1399 | 17783800 |
0.1995 | 0.2655 | 330 | 1.1429 | 18060800 |
0.2193 | 0.2695 | 335 | 1.1428 | 18325904 |
0.223 | 0.2735 | 340 | 1.1369 | 18601416 |
0.2246 | 0.2776 | 345 | 1.1401 | 18875056 |
0.1785 | 0.2816 | 350 | 1.1403 | 19142056 |
0.1229 | 0.2856 | 355 | 1.1371 | 19421664 |
0.138 | 0.2896 | 360 | 1.1377 | 19692216 |
0.1414 | 0.2936 | 365 | 1.1393 | 19965952 |
0.1585 | 0.2977 | 370 | 1.1352 | 20246768 |
0.1555 | 0.3017 | 375 | 1.1341 | 20515416 |
0.1626 | 0.3057 | 380 | 1.1368 | 20793432 |
0.0968 | 0.3097 | 385 | 1.1392 | 21066280 |
0.1358 | 0.3138 | 390 | 1.1344 | 21339752 |
0.2193 | 0.3178 | 395 | 1.1325 | 21613608 |
0.1656 | 0.3218 | 400 | 1.1355 | 21886960 |
0.2142 | 0.3258 | 405 | 1.1345 | 22157944 |
0.1435 | 0.3298 | 410 | 1.1330 | 22428408 |
0.1329 | 0.3339 | 415 | 1.1330 | 22701952 |
0.1397 | 0.3379 | 420 | 1.1323 | 22977248 |
0.1799 | 0.3419 | 425 | 1.1321 | 23249280 |
0.2237 | 0.3459 | 430 | 1.1314 | 23530504 |
0.1468 | 0.3500 | 435 | 1.1316 | 23805280 |
0.1737 | 0.3540 | 440 | 1.1300 | 24074744 |
0.2185 | 0.3580 | 445 | 1.1310 | 24345584 |
0.1852 | 0.3620 | 450 | 1.1296 | 24614840 |
0.1522 | 0.3660 | 455 | 1.1286 | 24893784 |
0.2289 | 0.3701 | 460 | 1.1287 | 25166736 |
0.1478 | 0.3741 | 465 | 1.1291 | 25438256 |
0.1086 | 0.3781 | 470 | 1.1296 | 25705720 |
0.1377 | 0.3821 | 475 | 1.1266 | 25984792 |
0.1684 | 0.3862 | 480 | 1.1259 | 26254008 |
0.146 | 0.3902 | 485 | 1.1265 | 26526864 |
0.1507 | 0.3942 | 490 | 1.1249 | 26798384 |
0.1298 | 0.3982 | 495 | 1.1267 | 27076416 |
0.1026 | 0.4023 | 500 | 1.1259 | 27347264 |
0.1561 | 0.4063 | 505 | 1.1282 | 27620832 |
0.1569 | 0.4103 | 510 | 1.1261 | 27895248 |
0.127 | 0.4143 | 515 | 1.1254 | 28165952 |
0.1903 | 0.4183 | 520 | 1.1256 | 28440432 |
0.1719 | 0.4224 | 525 | 1.1229 | 28713520 |
0.184 | 0.4264 | 530 | 1.1228 | 28985680 |
0.1248 | 0.4304 | 535 | 1.1232 | 29266064 |
0.1796 | 0.4344 | 540 | 1.1215 | 29538136 |
0.1859 | 0.4385 | 545 | 1.1217 | 29814888 |
0.1421 | 0.4425 | 550 | 1.1231 | 30090864 |
0.1962 | 0.4465 | 555 | 1.1230 | 30362696 |
0.1814 | 0.4505 | 560 | 1.1207 | 30635648 |
0.1315 | 0.4545 | 565 | 1.1220 | 30908528 |
0.1441 | 0.4586 | 570 | 1.1225 | 31180208 |
0.0751 | 0.4626 | 575 | 1.1224 | 31457232 |
0.1555 | 0.4666 | 580 | 1.1231 | 31729376 |
0.1712 | 0.4706 | 585 | 1.1206 | 32012600 |
0.1275 | 0.4747 | 590 | 1.1197 | 32278952 |
0.2187 | 0.4787 | 595 | 1.1195 | 32551528 |
0.2058 | 0.4827 | 600 | 1.1200 | 32817296 |
0.1592 | 0.4867 | 605 | 1.1202 | 33086856 |
0.1969 | 0.4907 | 610 | 1.1180 | 33361336 |
0.2001 | 0.4948 | 615 | 1.1194 | 33637008 |
0.1344 | 0.4988 | 620 | 1.1194 | 33910384 |
0.1714 | 0.5028 | 625 | 1.1185 | 34193000 |
0.207 | 0.5068 | 630 | 1.1184 | 34461728 |
0.1515 | 0.5109 | 635 | 1.1192 | 34734952 |
0.2266 | 0.5149 | 640 | 1.1187 | 35007520 |
0.1224 | 0.5189 | 645 | 1.1177 | 35284544 |
0.1632 | 0.5229 | 650 | 1.1172 | 35562360 |
0.177 | 0.5270 | 655 | 1.1162 | 35825736 |
0.1254 | 0.5310 | 660 | 1.1164 | 36095344 |
0.1959 | 0.5350 | 665 | 1.1161 | 36366216 |
0.1667 | 0.5390 | 670 | 1.1142 | 36631656 |
0.0861 | 0.5430 | 675 | 1.1148 | 36896424 |
0.096 | 0.5471 | 680 | 1.1157 | 37173928 |
0.156 | 0.5511 | 685 | 1.1153 | 37446984 |
0.1468 | 0.5551 | 690 | 1.1113 | 37717888 |
0.2049 | 0.5591 | 695 | 1.1117 | 37991688 |
0.1401 | 0.5632 | 700 | 1.1138 | 38264768 |
0.1366 | 0.5672 | 705 | 1.1153 | 38539056 |
0.1161 | 0.5712 | 710 | 1.1146 | 38813832 |
0.1551 | 0.5752 | 715 | 1.1128 | 39084024 |
0.1996 | 0.5792 | 720 | 1.1127 | 39352136 |
0.1615 | 0.5833 | 725 | 1.1111 | 39625656 |
0.1776 | 0.5873 | 730 | 1.1121 | 39907416 |
0.1767 | 0.5913 | 735 | 1.1114 | 40187416 |
0.1007 | 0.5953 | 740 | 1.1100 | 40457912 |
0.1444 | 0.5994 | 745 | 1.1108 | 40730656 |
0.1723 | 0.6034 | 750 | 1.1104 | 40997672 |
0.118 | 0.6074 | 755 | 1.1119 | 41273152 |
0.1022 | 0.6114 | 760 | 1.1108 | 41549816 |
0.1394 | 0.6154 | 765 | 1.1114 | 41824584 |
0.174 | 0.6195 | 770 | 1.1112 | 42101688 |
0.102 | 0.6235 | 775 | 1.1095 | 42381032 |
0.1617 | 0.6275 | 780 | 1.1080 | 42653520 |
0.1253 | 0.6315 | 785 | 1.1099 | 42927400 |
0.1961 | 0.6356 | 790 | 1.1096 | 43209088 |
0.1346 | 0.6396 | 795 | 1.1100 | 43485800 |
0.2274 | 0.6436 | 800 | 1.1090 | 43756192 |
0.2138 | 0.6476 | 805 | 1.1086 | 44030624 |
0.1111 | 0.6516 | 810 | 1.1089 | 44308792 |
0.1295 | 0.6557 | 815 | 1.1088 | 44591104 |
0.2271 | 0.6597 | 820 | 1.1092 | 44868400 |
0.1869 | 0.6637 | 825 | 1.1079 | 45140328 |
0.1149 | 0.6677 | 830 | 1.1082 | 45412792 |
0.1191 | 0.6718 | 835 | 1.1094 | 45693520 |
0.126 | 0.6758 | 840 | 1.1101 | 45967016 |
0.1549 | 0.6798 | 845 | 1.1087 | 46234720 |
0.1689 | 0.6838 | 850 | 1.1073 | 46505888 |
0.1359 | 0.6879 | 855 | 1.1070 | 46775064 |
0.1121 | 0.6919 | 860 | 1.1058 | 47042880 |
0.1649 | 0.6959 | 865 | 1.1057 | 47315720 |
0.2263 | 0.6999 | 870 | 1.1053 | 47584232 |
0.1161 | 0.7039 | 875 | 1.1037 | 47849336 |
0.1538 | 0.7080 | 880 | 1.1043 | 48121936 |
0.1568 | 0.7120 | 885 | 1.1090 | 48400536 |
0.1215 | 0.7160 | 890 | 1.1087 | 48677856 |
0.1837 | 0.7200 | 895 | 1.1045 | 48953592 |
0.1305 | 0.7241 | 900 | 1.1043 | 49228736 |
0.1416 | 0.7281 | 905 | 1.1053 | 49498592 |
0.2047 | 0.7321 | 910 | 1.1076 | 49778360 |
0.1744 | 0.7361 | 915 | 1.1048 | 50055080 |
0.228 | 0.7401 | 920 | 1.1049 | 50329672 |
0.1631 | 0.7442 | 925 | 1.1048 | 50608512 |
0.1507 | 0.7482 | 930 | 1.1023 | 50882480 |
0.1616 | 0.7522 | 935 | 1.1033 | 51153880 |
0.1531 | 0.7562 | 940 | 1.1049 | 51424832 |
0.1692 | 0.7603 | 945 | 1.1031 | 51706888 |
0.2223 | 0.7643 | 950 | 1.1021 | 51978144 |
0.0657 | 0.7683 | 955 | 1.1026 | 52247536 |
0.135 | 0.7723 | 960 | 1.1027 | 52520728 |
0.1681 | 0.7763 | 965 | 1.1031 | 52790496 |
0.1503 | 0.7804 | 970 | 1.1033 | 53063304 |
0.0966 | 0.7844 | 975 | 1.1026 | 53337432 |
0.1513 | 0.7884 | 980 | 1.1044 | 53612992 |
0.1115 | 0.7924 | 985 | 1.1036 | 53893224 |
0.1594 | 0.7965 | 990 | 1.1022 | 54165736 |
0.1443 | 0.8005 | 995 | 1.1015 | 54448120 |
0.2037 | 0.8045 | 1000 | 1.1030 | 54728624 |
0.1446 | 0.8085 | 1005 | 1.1019 | 55004736 |
0.2267 | 0.8126 | 1010 | 1.1007 | 55283880 |
0.1275 | 0.8166 | 1015 | 1.0993 | 55549176 |
0.1563 | 0.8206 | 1020 | 1.1016 | 55826936 |
0.1115 | 0.8246 | 1025 | 1.1057 | 56106256 |
0.1912 | 0.8286 | 1030 | 1.1042 | 56376152 |
0.1244 | 0.8327 | 1035 | 1.0996 | 56639360 |
0.1461 | 0.8367 | 1040 | 1.0992 | 56907088 |
0.1657 | 0.8407 | 1045 | 1.1005 | 57184640 |
0.0963 | 0.8447 | 1050 | 1.1006 | 57456360 |
0.1741 | 0.8488 | 1055 | 1.1013 | 57733456 |
0.1526 | 0.8528 | 1060 | 1.1023 | 58011552 |
0.0861 | 0.8568 | 1065 | 1.1012 | 58286040 |
0.1703 | 0.8608 | 1070 | 1.0999 | 58558632 |
0.1797 | 0.8648 | 1075 | 1.0989 | 58827032 |
0.1416 | 0.8689 | 1080 | 1.0996 | 59102632 |
0.1293 | 0.8729 | 1085 | 1.1000 | 59374792 |
0.2468 | 0.8769 | 1090 | 1.1009 | 59647416 |
0.1471 | 0.8809 | 1095 | 1.1014 | 59926664 |
0.1613 | 0.8850 | 1100 | 1.1006 | 60198112 |
0.1398 | 0.8890 | 1105 | 1.0990 | 60467328 |
0.1715 | 0.8930 | 1110 | 1.0995 | 60741856 |
0.1753 | 0.8970 | 1115 | 1.0988 | 61009016 |
0.1685 | 0.9010 | 1120 | 1.0992 | 61286680 |
0.0898 | 0.9051 | 1125 | 1.0981 | 61559272 |
0.1574 | 0.9091 | 1130 | 1.0974 | 61842792 |
0.211 | 0.9131 | 1135 | 1.0975 | 62117984 |
0.2118 | 0.9171 | 1140 | 1.0994 | 62394472 |
0.1104 | 0.9212 | 1145 | 1.1013 | 62668544 |
0.1651 | 0.9252 | 1150 | 1.1001 | 62939712 |
0.1026 | 0.9292 | 1155 | 1.0984 | 63218496 |
0.1458 | 0.9332 | 1160 | 1.0992 | 63494136 |
0.2029 | 0.9372 | 1165 | 1.1007 | 63771384 |
0.1845 | 0.9413 | 1170 | 1.1001 | 64044864 |
0.0982 | 0.9453 | 1175 | 1.0978 | 64316064 |
0.0766 | 0.9493 | 1180 | 1.0987 | 64596744 |
0.1961 | 0.9533 | 1185 | 1.1004 | 64865464 |
0.1169 | 0.9574 | 1190 | 1.0990 | 65141216 |
0.124 | 0.9614 | 1195 | 1.0991 | 65417704 |
0.1161 | 0.9654 | 1200 | 1.0983 | 65690272 |
0.2045 | 0.9694 | 1205 | 1.0962 | 65971064 |
0.14 | 0.9735 | 1210 | 1.0959 | 66246048 |
0.142 | 0.9775 | 1215 | 1.0977 | 66512056 |
0.1658 | 0.9815 | 1220 | 1.0980 | 66789200 |
0.1627 | 0.9855 | 1225 | 1.0969 | 67066952 |
0.1171 | 0.9895 | 1230 | 1.0958 | 67334168 |
0.1541 | 0.9936 | 1235 | 1.0953 | 67607200 |
0.1304 | 0.9976 | 1240 | 1.0962 | 67884024 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 7
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter13_sftsd0
Base model
google/gemma-2-2b