collapse_gemma-2-2b_hs2_accumulate_iter11_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0961
- Num Input Tokens Seen: 56359768
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6209 | 0.0048 | 5 | 1.3888 | 260784 |
1.6489 | 0.0095 | 10 | 1.3680 | 526144 |
1.532 | 0.0143 | 15 | 1.3201 | 802968 |
1.4996 | 0.0190 | 20 | 1.2669 | 1075152 |
1.3397 | 0.0238 | 25 | 1.2273 | 1346840 |
1.3125 | 0.0286 | 30 | 1.1901 | 1611296 |
1.1233 | 0.0333 | 35 | 1.1868 | 1879352 |
0.9443 | 0.0381 | 40 | 1.2076 | 2150000 |
0.8999 | 0.0428 | 45 | 1.2196 | 2423208 |
0.7159 | 0.0476 | 50 | 1.2657 | 2690856 |
0.5995 | 0.0524 | 55 | 1.2849 | 2959080 |
0.5344 | 0.0571 | 60 | 1.2687 | 3228184 |
0.4838 | 0.0619 | 65 | 1.2509 | 3494504 |
0.4883 | 0.0666 | 70 | 1.2302 | 3753896 |
0.2679 | 0.0714 | 75 | 1.2422 | 4017544 |
0.3494 | 0.0762 | 80 | 1.2179 | 4279264 |
0.2953 | 0.0809 | 85 | 1.2130 | 4546928 |
0.3641 | 0.0857 | 90 | 1.2049 | 4813768 |
0.3191 | 0.0904 | 95 | 1.1886 | 5081944 |
0.2461 | 0.0952 | 100 | 1.1918 | 5354328 |
0.2695 | 0.1000 | 105 | 1.1858 | 5614768 |
0.2698 | 0.1047 | 110 | 1.1833 | 5876376 |
0.233 | 0.1095 | 115 | 1.1827 | 6142224 |
0.2352 | 0.1143 | 120 | 1.1814 | 6407280 |
0.2541 | 0.1190 | 125 | 1.1773 | 6668672 |
0.2663 | 0.1238 | 130 | 1.1767 | 6941760 |
0.2468 | 0.1285 | 135 | 1.1729 | 7204472 |
0.2495 | 0.1333 | 140 | 1.1707 | 7475488 |
0.1993 | 0.1381 | 145 | 1.1663 | 7743640 |
0.2654 | 0.1428 | 150 | 1.1684 | 8002848 |
0.2451 | 0.1476 | 155 | 1.1671 | 8262672 |
0.2106 | 0.1523 | 160 | 1.1616 | 8533696 |
0.1963 | 0.1571 | 165 | 1.1632 | 8807600 |
0.1678 | 0.1619 | 170 | 1.1628 | 9072528 |
0.2143 | 0.1666 | 175 | 1.1583 | 9342824 |
0.1857 | 0.1714 | 180 | 1.1554 | 9613104 |
0.2452 | 0.1761 | 185 | 1.1616 | 9877512 |
0.2276 | 0.1809 | 190 | 1.1538 | 10145024 |
0.1419 | 0.1857 | 195 | 1.1536 | 10415848 |
0.2847 | 0.1904 | 200 | 1.1557 | 10688320 |
0.1709 | 0.1952 | 205 | 1.1516 | 10955016 |
0.2264 | 0.1999 | 210 | 1.1518 | 11221864 |
0.2114 | 0.2047 | 215 | 1.1476 | 11497224 |
0.1591 | 0.2095 | 220 | 1.1493 | 11764448 |
0.2429 | 0.2142 | 225 | 1.1469 | 12032800 |
0.222 | 0.2190 | 230 | 1.1445 | 12296112 |
0.1844 | 0.2237 | 235 | 1.1441 | 12567544 |
0.2173 | 0.2285 | 240 | 1.1410 | 12829728 |
0.2398 | 0.2333 | 245 | 1.1391 | 13093064 |
0.1395 | 0.2380 | 250 | 1.1423 | 13355296 |
0.1644 | 0.2428 | 255 | 1.1410 | 13617216 |
0.1951 | 0.2475 | 260 | 1.1406 | 13887744 |
0.1772 | 0.2523 | 265 | 1.1408 | 14158392 |
0.2206 | 0.2571 | 270 | 1.1384 | 14432008 |
0.2658 | 0.2618 | 275 | 1.1363 | 14695872 |
0.1841 | 0.2666 | 280 | 1.1364 | 14962984 |
0.1656 | 0.2713 | 285 | 1.1373 | 15239000 |
0.2024 | 0.2761 | 290 | 1.1360 | 15503752 |
0.1559 | 0.2809 | 295 | 1.1363 | 15765512 |
0.1714 | 0.2856 | 300 | 1.1352 | 16036816 |
0.102 | 0.2904 | 305 | 1.1352 | 16300064 |
0.2057 | 0.2952 | 310 | 1.1364 | 16571744 |
0.2353 | 0.2999 | 315 | 1.1326 | 16840544 |
0.1378 | 0.3047 | 320 | 1.1306 | 17110016 |
0.1395 | 0.3094 | 325 | 1.1366 | 17380776 |
0.1747 | 0.3142 | 330 | 1.1318 | 17647416 |
0.1444 | 0.3190 | 335 | 1.1308 | 17913208 |
0.2003 | 0.3237 | 340 | 1.1325 | 18180568 |
0.1373 | 0.3285 | 345 | 1.1339 | 18451296 |
0.1483 | 0.3332 | 350 | 1.1310 | 18726416 |
0.2017 | 0.3380 | 355 | 1.1290 | 18999216 |
0.1496 | 0.3428 | 360 | 1.1284 | 19277048 |
0.1912 | 0.3475 | 365 | 1.1289 | 19546024 |
0.1944 | 0.3523 | 370 | 1.1312 | 19817824 |
0.1897 | 0.3570 | 375 | 1.1294 | 20083960 |
0.1735 | 0.3618 | 380 | 1.1252 | 20350640 |
0.2085 | 0.3666 | 385 | 1.1258 | 20619120 |
0.1385 | 0.3713 | 390 | 1.1300 | 20888696 |
0.1942 | 0.3761 | 395 | 1.1233 | 21156856 |
0.1413 | 0.3808 | 400 | 1.1238 | 21425648 |
0.2178 | 0.3856 | 405 | 1.1257 | 21696448 |
0.2536 | 0.3904 | 410 | 1.1219 | 21967312 |
0.1956 | 0.3951 | 415 | 1.1249 | 22234304 |
0.1643 | 0.3999 | 420 | 1.1239 | 22503168 |
0.2683 | 0.4046 | 425 | 1.1195 | 22769672 |
0.1949 | 0.4094 | 430 | 1.1190 | 23040264 |
0.2001 | 0.4142 | 435 | 1.1240 | 23309600 |
0.1348 | 0.4189 | 440 | 1.1218 | 23579856 |
0.1836 | 0.4237 | 445 | 1.1212 | 23852144 |
0.1498 | 0.4284 | 450 | 1.1212 | 24114304 |
0.1595 | 0.4332 | 455 | 1.1242 | 24376912 |
0.1384 | 0.4380 | 460 | 1.1204 | 24644368 |
0.1569 | 0.4427 | 465 | 1.1194 | 24915744 |
0.1477 | 0.4475 | 470 | 1.1190 | 25183280 |
0.1853 | 0.4522 | 475 | 1.1173 | 25457376 |
0.1485 | 0.4570 | 480 | 1.1187 | 25732664 |
0.165 | 0.4618 | 485 | 1.1204 | 26004360 |
0.1977 | 0.4665 | 490 | 1.1197 | 26270144 |
0.1273 | 0.4713 | 495 | 1.1173 | 26541272 |
0.2433 | 0.4760 | 500 | 1.1174 | 26806808 |
0.1909 | 0.4808 | 505 | 1.1178 | 27074376 |
0.191 | 0.4856 | 510 | 1.1189 | 27338952 |
0.2088 | 0.4903 | 515 | 1.1169 | 27606808 |
0.1777 | 0.4951 | 520 | 1.1147 | 27875304 |
0.208 | 0.4999 | 525 | 1.1175 | 28144272 |
0.1745 | 0.5046 | 530 | 1.1159 | 28409000 |
0.1306 | 0.5094 | 535 | 1.1128 | 28674056 |
0.1432 | 0.5141 | 540 | 1.1160 | 28943648 |
0.2056 | 0.5189 | 545 | 1.1164 | 29207648 |
0.1777 | 0.5237 | 550 | 1.1132 | 29477544 |
0.2033 | 0.5284 | 555 | 1.1140 | 29744816 |
0.1983 | 0.5332 | 560 | 1.1136 | 30021232 |
0.2389 | 0.5379 | 565 | 1.1130 | 30291032 |
0.1681 | 0.5427 | 570 | 1.1152 | 30555728 |
0.1639 | 0.5475 | 575 | 1.1131 | 30827752 |
0.195 | 0.5522 | 580 | 1.1102 | 31097840 |
0.1447 | 0.5570 | 585 | 1.1113 | 31373424 |
0.2198 | 0.5617 | 590 | 1.1115 | 31639232 |
0.1382 | 0.5665 | 595 | 1.1116 | 31901832 |
0.1605 | 0.5713 | 600 | 1.1122 | 32167752 |
0.2186 | 0.5760 | 605 | 1.1121 | 32436896 |
0.1891 | 0.5808 | 610 | 1.1104 | 32710184 |
0.1787 | 0.5855 | 615 | 1.1113 | 32984864 |
0.1706 | 0.5903 | 620 | 1.1107 | 33251232 |
0.2048 | 0.5951 | 625 | 1.1105 | 33527304 |
0.191 | 0.5998 | 630 | 1.1102 | 33798576 |
0.124 | 0.6046 | 635 | 1.1098 | 34063624 |
0.1499 | 0.6093 | 640 | 1.1079 | 34330376 |
0.1055 | 0.6141 | 645 | 1.1087 | 34599840 |
0.164 | 0.6189 | 650 | 1.1103 | 34865960 |
0.1665 | 0.6236 | 655 | 1.1105 | 35135704 |
0.14 | 0.6284 | 660 | 1.1088 | 35404640 |
0.1862 | 0.6331 | 665 | 1.1116 | 35670952 |
0.196 | 0.6379 | 670 | 1.1110 | 35938232 |
0.1475 | 0.6427 | 675 | 1.1083 | 36200712 |
0.1698 | 0.6474 | 680 | 1.1059 | 36476144 |
0.1544 | 0.6522 | 685 | 1.1072 | 36741712 |
0.1455 | 0.6569 | 690 | 1.1097 | 37007608 |
0.2331 | 0.6617 | 695 | 1.1074 | 37267184 |
0.1697 | 0.6665 | 700 | 1.1065 | 37537536 |
0.1208 | 0.6712 | 705 | 1.1076 | 37799632 |
0.1679 | 0.6760 | 710 | 1.1089 | 38067184 |
0.1931 | 0.6807 | 715 | 1.1075 | 38340032 |
0.1315 | 0.6855 | 720 | 1.1077 | 38613992 |
0.1194 | 0.6903 | 725 | 1.1079 | 38894384 |
0.1902 | 0.6950 | 730 | 1.1070 | 39172040 |
0.1675 | 0.6998 | 735 | 1.1072 | 39444864 |
0.1516 | 0.7046 | 740 | 1.1061 | 39716440 |
0.0847 | 0.7093 | 745 | 1.1049 | 39983736 |
0.1703 | 0.7141 | 750 | 1.1057 | 40256696 |
0.1791 | 0.7188 | 755 | 1.1056 | 40521264 |
0.2551 | 0.7236 | 760 | 1.1044 | 40793072 |
0.1814 | 0.7284 | 765 | 1.1054 | 41064248 |
0.126 | 0.7331 | 770 | 1.1070 | 41338416 |
0.211 | 0.7379 | 775 | 1.1049 | 41600992 |
0.1668 | 0.7426 | 780 | 1.1043 | 41870408 |
0.1821 | 0.7474 | 785 | 1.1061 | 42139008 |
0.186 | 0.7522 | 790 | 1.1033 | 42407016 |
0.209 | 0.7569 | 795 | 1.1039 | 42662976 |
0.226 | 0.7617 | 800 | 1.1040 | 42934592 |
0.1668 | 0.7664 | 805 | 1.1026 | 43199808 |
0.2089 | 0.7712 | 810 | 1.1019 | 43460640 |
0.1736 | 0.7760 | 815 | 1.1038 | 43729072 |
0.1403 | 0.7807 | 820 | 1.1022 | 43997664 |
0.1947 | 0.7855 | 825 | 1.1017 | 44258840 |
0.1333 | 0.7902 | 830 | 1.1020 | 44518528 |
0.2415 | 0.7950 | 835 | 1.1042 | 44785256 |
0.1791 | 0.7998 | 840 | 1.1018 | 45057824 |
0.2226 | 0.8045 | 845 | 1.1013 | 45326808 |
0.1988 | 0.8093 | 850 | 1.1012 | 45595496 |
0.207 | 0.8140 | 855 | 1.1026 | 45862328 |
0.1112 | 0.8188 | 860 | 1.1019 | 46130024 |
0.1775 | 0.8236 | 865 | 1.1030 | 46405848 |
0.2009 | 0.8283 | 870 | 1.1019 | 46676936 |
0.1478 | 0.8331 | 875 | 1.1004 | 46955008 |
0.2381 | 0.8378 | 880 | 1.1006 | 47220736 |
0.1951 | 0.8426 | 885 | 1.0998 | 47486768 |
0.1363 | 0.8474 | 890 | 1.0995 | 47750624 |
0.1287 | 0.8521 | 895 | 1.0994 | 48029400 |
0.144 | 0.8569 | 900 | 1.1004 | 48301424 |
0.1721 | 0.8616 | 905 | 1.0982 | 48569280 |
0.1385 | 0.8664 | 910 | 1.0990 | 48836384 |
0.1721 | 0.8712 | 915 | 1.0983 | 49104000 |
0.2214 | 0.8759 | 920 | 1.0981 | 49378064 |
0.1441 | 0.8807 | 925 | 1.0987 | 49643256 |
0.2227 | 0.8855 | 930 | 1.1017 | 49914304 |
0.1388 | 0.8902 | 935 | 1.1024 | 50184528 |
0.1303 | 0.8950 | 940 | 1.0992 | 50453176 |
0.192 | 0.8997 | 945 | 1.0968 | 50723312 |
0.1817 | 0.9045 | 950 | 1.0985 | 50998824 |
0.1661 | 0.9093 | 955 | 1.0989 | 51273248 |
0.1249 | 0.9140 | 960 | 1.0994 | 51535824 |
0.1622 | 0.9188 | 965 | 1.0993 | 51805072 |
0.1294 | 0.9235 | 970 | 1.0982 | 52074128 |
0.1132 | 0.9283 | 975 | 1.0975 | 52340296 |
0.109 | 0.9331 | 980 | 1.0977 | 52606592 |
0.1585 | 0.9378 | 985 | 1.0972 | 52876464 |
0.1702 | 0.9426 | 990 | 1.0972 | 53144688 |
0.1798 | 0.9473 | 995 | 1.0986 | 53419016 |
0.2313 | 0.9521 | 1000 | 1.0993 | 53685696 |
0.1984 | 0.9569 | 1005 | 1.0963 | 53948536 |
0.1253 | 0.9616 | 1010 | 1.0970 | 54213712 |
0.1165 | 0.9664 | 1015 | 1.0979 | 54480072 |
0.181 | 0.9711 | 1020 | 1.0970 | 54753368 |
0.1439 | 0.9759 | 1025 | 1.0959 | 55022136 |
0.1115 | 0.9807 | 1030 | 1.0979 | 55293552 |
0.1213 | 0.9854 | 1035 | 1.0991 | 55557936 |
0.1227 | 0.9902 | 1040 | 1.0979 | 55828888 |
0.1455 | 0.9949 | 1045 | 1.0967 | 56094144 |
0.1732 | 0.9997 | 1050 | 1.0961 | 56359768 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter11_sftsd1
Base model
google/gemma-2-2b