collapse_gemma-2-2b_hs2_accumulate_iter14_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0983
- Num Input Tokens Seen: 71681408
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6189 | 0.0037 | 5 | 1.3903 | 258592 |
1.6533 | 0.0075 | 10 | 1.3785 | 520136 |
1.5936 | 0.0112 | 15 | 1.3471 | 787304 |
1.4442 | 0.0149 | 20 | 1.2979 | 1052992 |
1.3599 | 0.0186 | 25 | 1.2524 | 1323224 |
1.3203 | 0.0224 | 30 | 1.2203 | 1586568 |
1.1893 | 0.0261 | 35 | 1.1902 | 1856088 |
1.1879 | 0.0298 | 40 | 1.1912 | 2124184 |
1.018 | 0.0335 | 45 | 1.2120 | 2394032 |
0.844 | 0.0373 | 50 | 1.2413 | 2666496 |
0.7985 | 0.0410 | 55 | 1.2719 | 2933776 |
0.607 | 0.0447 | 60 | 1.2989 | 3196272 |
0.5184 | 0.0485 | 65 | 1.3164 | 3453536 |
0.5267 | 0.0522 | 70 | 1.3142 | 3717328 |
0.3412 | 0.0559 | 75 | 1.2946 | 3981288 |
0.3162 | 0.0596 | 80 | 1.2388 | 4249208 |
0.2908 | 0.0634 | 85 | 1.2478 | 4517272 |
0.3809 | 0.0671 | 90 | 1.2150 | 4782800 |
0.2352 | 0.0708 | 95 | 1.2241 | 5048976 |
0.2656 | 0.0746 | 100 | 1.2114 | 5317904 |
0.2393 | 0.0783 | 105 | 1.2047 | 5596984 |
0.189 | 0.0820 | 110 | 1.1972 | 5863400 |
0.2498 | 0.0857 | 115 | 1.1971 | 6125336 |
0.2765 | 0.0895 | 120 | 1.1881 | 6386976 |
0.2321 | 0.0932 | 125 | 1.1921 | 6654848 |
0.1849 | 0.0969 | 130 | 1.1833 | 6920368 |
0.2719 | 0.1006 | 135 | 1.1870 | 7185912 |
0.181 | 0.1044 | 140 | 1.1764 | 7441584 |
0.2231 | 0.1081 | 145 | 1.1849 | 7712992 |
0.1763 | 0.1118 | 150 | 1.1757 | 7984832 |
0.1576 | 0.1156 | 155 | 1.1829 | 8254280 |
0.1472 | 0.1193 | 160 | 1.1765 | 8524184 |
0.1962 | 0.1230 | 165 | 1.1682 | 8799664 |
0.1944 | 0.1267 | 170 | 1.1750 | 9072896 |
0.1696 | 0.1305 | 175 | 1.1687 | 9341096 |
0.2041 | 0.1342 | 180 | 1.1695 | 9609400 |
0.1678 | 0.1379 | 185 | 1.1669 | 9872456 |
0.1977 | 0.1416 | 190 | 1.1655 | 10139800 |
0.165 | 0.1454 | 195 | 1.1623 | 10405336 |
0.182 | 0.1491 | 200 | 1.1724 | 10666864 |
0.1788 | 0.1528 | 205 | 1.1610 | 10938568 |
0.121 | 0.1566 | 210 | 1.1591 | 11202424 |
0.206 | 0.1603 | 215 | 1.1589 | 11472448 |
0.1471 | 0.1640 | 220 | 1.1595 | 11742624 |
0.1284 | 0.1677 | 225 | 1.1621 | 12011352 |
0.166 | 0.1715 | 230 | 1.1623 | 12274920 |
0.1972 | 0.1752 | 235 | 1.1574 | 12537960 |
0.1639 | 0.1789 | 240 | 1.1537 | 12809704 |
0.098 | 0.1826 | 245 | 1.1544 | 13073720 |
0.1677 | 0.1864 | 250 | 1.1519 | 13336464 |
0.1572 | 0.1901 | 255 | 1.1574 | 13610000 |
0.0936 | 0.1938 | 260 | 1.1522 | 13869680 |
0.223 | 0.1976 | 265 | 1.1510 | 14142264 |
0.1596 | 0.2013 | 270 | 1.1523 | 14405024 |
0.1639 | 0.2050 | 275 | 1.1483 | 14672776 |
0.1816 | 0.2087 | 280 | 1.1478 | 14947760 |
0.1874 | 0.2125 | 285 | 1.1477 | 15218128 |
0.1318 | 0.2162 | 290 | 1.1470 | 15482464 |
0.2142 | 0.2199 | 295 | 1.1421 | 15757288 |
0.0978 | 0.2237 | 300 | 1.1454 | 16021544 |
0.2199 | 0.2274 | 305 | 1.1443 | 16288984 |
0.1888 | 0.2311 | 310 | 1.1448 | 16557856 |
0.1948 | 0.2348 | 315 | 1.1460 | 16831144 |
0.125 | 0.2386 | 320 | 1.1430 | 17107912 |
0.1308 | 0.2423 | 325 | 1.1412 | 17371928 |
0.1297 | 0.2460 | 330 | 1.1422 | 17640792 |
0.1662 | 0.2497 | 335 | 1.1456 | 17904680 |
0.1872 | 0.2535 | 340 | 1.1424 | 18176344 |
0.1219 | 0.2572 | 345 | 1.1377 | 18440616 |
0.171 | 0.2609 | 350 | 1.1414 | 18706896 |
0.1699 | 0.2647 | 355 | 1.1379 | 18974672 |
0.1635 | 0.2684 | 360 | 1.1377 | 19237936 |
0.1256 | 0.2721 | 365 | 1.1384 | 19514272 |
0.1567 | 0.2758 | 370 | 1.1380 | 19788448 |
0.1842 | 0.2796 | 375 | 1.1336 | 20061776 |
0.1255 | 0.2833 | 380 | 1.1354 | 20325216 |
0.159 | 0.2870 | 385 | 1.1371 | 20594552 |
0.1491 | 0.2907 | 390 | 1.1348 | 20858184 |
0.2113 | 0.2945 | 395 | 1.1374 | 21126896 |
0.1427 | 0.2982 | 400 | 1.1363 | 21394840 |
0.1911 | 0.3019 | 405 | 1.1330 | 21662728 |
0.187 | 0.3057 | 410 | 1.1324 | 21925416 |
0.144 | 0.3094 | 415 | 1.1329 | 22193536 |
0.1263 | 0.3131 | 420 | 1.1312 | 22463000 |
0.2026 | 0.3168 | 425 | 1.1314 | 22724216 |
0.176 | 0.3206 | 430 | 1.1296 | 22998112 |
0.1409 | 0.3243 | 435 | 1.1330 | 23266688 |
0.1561 | 0.3280 | 440 | 1.1322 | 23540496 |
0.1731 | 0.3317 | 445 | 1.1298 | 23815080 |
0.1428 | 0.3355 | 450 | 1.1301 | 24083560 |
0.124 | 0.3392 | 455 | 1.1318 | 24351768 |
0.151 | 0.3429 | 460 | 1.1304 | 24619160 |
0.1464 | 0.3467 | 465 | 1.1289 | 24881152 |
0.1438 | 0.3504 | 470 | 1.1297 | 25154976 |
0.1844 | 0.3541 | 475 | 1.1299 | 25422080 |
0.1099 | 0.3578 | 480 | 1.1288 | 25688696 |
0.2175 | 0.3616 | 485 | 1.1291 | 25958688 |
0.1765 | 0.3653 | 490 | 1.1290 | 26224856 |
0.1757 | 0.3690 | 495 | 1.1262 | 26489304 |
0.1317 | 0.3728 | 500 | 1.1273 | 26754624 |
0.2293 | 0.3765 | 505 | 1.1316 | 27017968 |
0.1462 | 0.3802 | 510 | 1.1248 | 27282064 |
0.0883 | 0.3839 | 515 | 1.1244 | 27547632 |
0.1269 | 0.3877 | 520 | 1.1283 | 27808824 |
0.1947 | 0.3914 | 525 | 1.1221 | 28076120 |
0.1521 | 0.3951 | 530 | 1.1229 | 28340232 |
0.1304 | 0.3988 | 535 | 1.1247 | 28608904 |
0.1259 | 0.4026 | 540 | 1.1235 | 28881192 |
0.1057 | 0.4063 | 545 | 1.1225 | 29139640 |
0.1059 | 0.4100 | 550 | 1.1207 | 29411472 |
0.2029 | 0.4138 | 555 | 1.1222 | 29674968 |
0.1005 | 0.4175 | 560 | 1.1235 | 29939488 |
0.1584 | 0.4212 | 565 | 1.1237 | 30211520 |
0.0734 | 0.4249 | 570 | 1.1192 | 30471960 |
0.1487 | 0.4287 | 575 | 1.1212 | 30741120 |
0.1251 | 0.4324 | 580 | 1.1225 | 31005896 |
0.1091 | 0.4361 | 585 | 1.1188 | 31271928 |
0.1796 | 0.4398 | 590 | 1.1194 | 31541328 |
0.2269 | 0.4436 | 595 | 1.1207 | 31806640 |
0.1463 | 0.4473 | 600 | 1.1190 | 32077280 |
0.2014 | 0.4510 | 605 | 1.1196 | 32338600 |
0.0954 | 0.4548 | 610 | 1.1235 | 32606184 |
0.1182 | 0.4585 | 615 | 1.1179 | 32873576 |
0.1933 | 0.4622 | 620 | 1.1157 | 33138360 |
0.1548 | 0.4659 | 625 | 1.1166 | 33409000 |
0.1274 | 0.4697 | 630 | 1.1184 | 33681104 |
0.1785 | 0.4734 | 635 | 1.1152 | 33952128 |
0.142 | 0.4771 | 640 | 1.1148 | 34218912 |
0.1539 | 0.4808 | 645 | 1.1185 | 34487520 |
0.1972 | 0.4846 | 650 | 1.1177 | 34753336 |
0.1385 | 0.4883 | 655 | 1.1150 | 35015672 |
0.1964 | 0.4920 | 660 | 1.1152 | 35274864 |
0.1426 | 0.4958 | 665 | 1.1168 | 35541488 |
0.1373 | 0.4995 | 670 | 1.1151 | 35806464 |
0.1252 | 0.5032 | 675 | 1.1122 | 36077832 |
0.1497 | 0.5069 | 680 | 1.1152 | 36343144 |
0.2283 | 0.5107 | 685 | 1.1145 | 36616552 |
0.2025 | 0.5144 | 690 | 1.1133 | 36888264 |
0.191 | 0.5181 | 695 | 1.1151 | 37155520 |
0.2471 | 0.5219 | 700 | 1.1141 | 37427104 |
0.1591 | 0.5256 | 705 | 1.1129 | 37698688 |
0.1809 | 0.5293 | 710 | 1.1140 | 37963424 |
0.2068 | 0.5330 | 715 | 1.1155 | 38227504 |
0.1506 | 0.5368 | 720 | 1.1142 | 38506080 |
0.1644 | 0.5405 | 725 | 1.1122 | 38778312 |
0.133 | 0.5442 | 730 | 1.1144 | 39042280 |
0.1984 | 0.5479 | 735 | 1.1147 | 39313856 |
0.1371 | 0.5517 | 740 | 1.1129 | 39582264 |
0.1489 | 0.5554 | 745 | 1.1122 | 39854288 |
0.2328 | 0.5591 | 750 | 1.1126 | 40115392 |
0.1065 | 0.5629 | 755 | 1.1117 | 40380336 |
0.1163 | 0.5666 | 760 | 1.1122 | 40653264 |
0.1336 | 0.5703 | 765 | 1.1114 | 40920736 |
0.1816 | 0.5740 | 770 | 1.1086 | 41187928 |
0.1523 | 0.5778 | 775 | 1.1127 | 41454400 |
0.144 | 0.5815 | 780 | 1.1134 | 41716536 |
0.1234 | 0.5852 | 785 | 1.1098 | 41985752 |
0.1393 | 0.5889 | 790 | 1.1074 | 42253072 |
0.1523 | 0.5927 | 795 | 1.1104 | 42522688 |
0.1703 | 0.5964 | 800 | 1.1093 | 42786296 |
0.1728 | 0.6001 | 805 | 1.1090 | 43060656 |
0.2035 | 0.6039 | 810 | 1.1085 | 43326648 |
0.1507 | 0.6076 | 815 | 1.1100 | 43591848 |
0.2184 | 0.6113 | 820 | 1.1113 | 43852144 |
0.1109 | 0.6150 | 825 | 1.1143 | 44123664 |
0.1824 | 0.6188 | 830 | 1.1105 | 44395120 |
0.1573 | 0.6225 | 835 | 1.1065 | 44659360 |
0.1844 | 0.6262 | 840 | 1.1073 | 44927760 |
0.0809 | 0.6300 | 845 | 1.1106 | 45191616 |
0.1398 | 0.6337 | 850 | 1.1088 | 45457600 |
0.1467 | 0.6374 | 855 | 1.1075 | 45725952 |
0.1177 | 0.6411 | 860 | 1.1087 | 45995592 |
0.1241 | 0.6449 | 865 | 1.1102 | 46260000 |
0.1571 | 0.6486 | 870 | 1.1086 | 46536800 |
0.1311 | 0.6523 | 875 | 1.1069 | 46801720 |
0.139 | 0.6560 | 880 | 1.1074 | 47061392 |
0.2 | 0.6598 | 885 | 1.1077 | 47323184 |
0.17 | 0.6635 | 890 | 1.1069 | 47593312 |
0.1469 | 0.6672 | 895 | 1.1070 | 47867512 |
0.0834 | 0.6710 | 900 | 1.1089 | 48135672 |
0.1408 | 0.6747 | 905 | 1.1085 | 48410192 |
0.1425 | 0.6784 | 910 | 1.1069 | 48679248 |
0.1101 | 0.6821 | 915 | 1.1060 | 48948600 |
0.123 | 0.6859 | 920 | 1.1049 | 49214928 |
0.1941 | 0.6896 | 925 | 1.1059 | 49480064 |
0.1858 | 0.6933 | 930 | 1.1080 | 49743176 |
0.163 | 0.6970 | 935 | 1.1067 | 50016232 |
0.1503 | 0.7008 | 940 | 1.1053 | 50287888 |
0.1076 | 0.7045 | 945 | 1.1062 | 50547088 |
0.1406 | 0.7082 | 950 | 1.1066 | 50808544 |
0.1159 | 0.7120 | 955 | 1.1064 | 51068464 |
0.147 | 0.7157 | 960 | 1.1081 | 51330424 |
0.1686 | 0.7194 | 965 | 1.1075 | 51599088 |
0.126 | 0.7231 | 970 | 1.1051 | 51861672 |
0.1349 | 0.7269 | 975 | 1.1060 | 52130424 |
0.2244 | 0.7306 | 980 | 1.1050 | 52393032 |
0.1627 | 0.7343 | 985 | 1.1033 | 52661192 |
0.1655 | 0.7380 | 990 | 1.1029 | 52926688 |
0.1648 | 0.7418 | 995 | 1.1041 | 53192864 |
0.1496 | 0.7455 | 1000 | 1.1030 | 53460104 |
0.0939 | 0.7492 | 1005 | 1.1042 | 53727528 |
0.1534 | 0.7530 | 1010 | 1.1039 | 54000144 |
0.1444 | 0.7567 | 1015 | 1.1034 | 54275552 |
0.2096 | 0.7604 | 1020 | 1.1034 | 54539912 |
0.1773 | 0.7641 | 1025 | 1.1036 | 54804208 |
0.1582 | 0.7679 | 1030 | 1.1032 | 55071912 |
0.1338 | 0.7716 | 1035 | 1.1022 | 55334240 |
0.0797 | 0.7753 | 1040 | 1.1020 | 55602032 |
0.1348 | 0.7791 | 1045 | 1.1032 | 55869440 |
0.1956 | 0.7828 | 1050 | 1.1036 | 56134792 |
0.1398 | 0.7865 | 1055 | 1.1017 | 56398896 |
0.1558 | 0.7902 | 1060 | 1.1020 | 56661024 |
0.1001 | 0.7940 | 1065 | 1.1033 | 56929080 |
0.1681 | 0.7977 | 1070 | 1.1051 | 57192904 |
0.1837 | 0.8014 | 1075 | 1.1033 | 57465024 |
0.1474 | 0.8051 | 1080 | 1.1013 | 57734920 |
0.2078 | 0.8089 | 1085 | 1.1014 | 58001304 |
0.2162 | 0.8126 | 1090 | 1.1016 | 58258080 |
0.1362 | 0.8163 | 1095 | 1.1006 | 58532496 |
0.1744 | 0.8201 | 1100 | 1.0993 | 58793752 |
0.1677 | 0.8238 | 1105 | 1.1001 | 59063256 |
0.1596 | 0.8275 | 1110 | 1.1029 | 59331272 |
0.1288 | 0.8312 | 1115 | 1.1019 | 59598784 |
0.1447 | 0.8350 | 1120 | 1.1002 | 59864832 |
0.1535 | 0.8387 | 1125 | 1.1007 | 60131384 |
0.1665 | 0.8424 | 1130 | 1.1003 | 60401632 |
0.1021 | 0.8461 | 1135 | 1.0985 | 60668912 |
0.1117 | 0.8499 | 1140 | 1.1009 | 60938560 |
0.0863 | 0.8536 | 1145 | 1.1026 | 61199480 |
0.1511 | 0.8573 | 1150 | 1.1019 | 61467560 |
0.1401 | 0.8611 | 1155 | 1.1005 | 61730664 |
0.1025 | 0.8648 | 1160 | 1.1001 | 62004608 |
0.1067 | 0.8685 | 1165 | 1.1011 | 62268408 |
0.11 | 0.8722 | 1170 | 1.1022 | 62530664 |
0.1521 | 0.8760 | 1175 | 1.1020 | 62798320 |
0.1703 | 0.8797 | 1180 | 1.1008 | 63063776 |
0.1261 | 0.8834 | 1185 | 1.1007 | 63335176 |
0.1122 | 0.8871 | 1190 | 1.1028 | 63600208 |
0.1242 | 0.8909 | 1195 | 1.1018 | 63865408 |
0.1889 | 0.8946 | 1200 | 1.0999 | 64129984 |
0.1907 | 0.8983 | 1205 | 1.0995 | 64395624 |
0.1538 | 0.9021 | 1210 | 1.1000 | 64666088 |
0.1218 | 0.9058 | 1215 | 1.0998 | 64927608 |
0.1269 | 0.9095 | 1220 | 1.0988 | 65200320 |
0.1608 | 0.9132 | 1225 | 1.0983 | 65462912 |
0.089 | 0.9170 | 1230 | 1.0989 | 65729208 |
0.1804 | 0.9207 | 1235 | 1.1010 | 66000656 |
0.1863 | 0.9244 | 1240 | 1.0989 | 66262048 |
0.1276 | 0.9282 | 1245 | 1.0975 | 66526832 |
0.1231 | 0.9319 | 1250 | 1.0985 | 66794864 |
0.1471 | 0.9356 | 1255 | 1.1015 | 67063160 |
0.1487 | 0.9393 | 1260 | 1.1014 | 67334512 |
0.1343 | 0.9431 | 1265 | 1.0989 | 67600856 |
0.0863 | 0.9468 | 1270 | 1.0979 | 67872424 |
0.1549 | 0.9505 | 1275 | 1.0980 | 68142240 |
0.1856 | 0.9542 | 1280 | 1.0983 | 68410208 |
0.1087 | 0.9580 | 1285 | 1.0985 | 68689080 |
0.1569 | 0.9617 | 1290 | 1.1002 | 68957256 |
0.1129 | 0.9654 | 1295 | 1.0997 | 69228232 |
0.1713 | 0.9692 | 1300 | 1.0979 | 69495160 |
0.1101 | 0.9729 | 1305 | 1.0958 | 69772504 |
0.1819 | 0.9766 | 1310 | 1.0964 | 70041296 |
0.1063 | 0.9803 | 1315 | 1.0976 | 70298680 |
0.1262 | 0.9841 | 1320 | 1.0974 | 70566024 |
0.1097 | 0.9878 | 1325 | 1.0969 | 70839624 |
0.2523 | 0.9915 | 1330 | 1.0972 | 71102456 |
0.1185 | 0.9952 | 1335 | 1.0987 | 71367632 |
0.0895 | 0.9990 | 1340 | 1.0986 | 71629912 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter14_sftsd1
Base model
google/gemma-2-2b