collapse_gemma-2-2b_hs2_accumulate_iter16_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1018
- Num Input Tokens Seen: 81892720
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5351 | 0.0033 | 5 | 1.3897 | 261072 |
1.5528 | 0.0065 | 10 | 1.3820 | 520672 |
1.6113 | 0.0098 | 15 | 1.3534 | 788120 |
1.5779 | 0.0130 | 20 | 1.3152 | 1053824 |
1.3993 | 0.0163 | 25 | 1.2684 | 1322960 |
1.3475 | 0.0195 | 30 | 1.2342 | 1586568 |
1.2747 | 0.0228 | 35 | 1.2022 | 1845912 |
1.2084 | 0.0260 | 40 | 1.1879 | 2108856 |
1.0638 | 0.0293 | 45 | 1.1988 | 2380280 |
0.971 | 0.0325 | 50 | 1.2324 | 2637440 |
0.7965 | 0.0358 | 55 | 1.2486 | 2902808 |
0.7476 | 0.0390 | 60 | 1.2855 | 3176120 |
0.6096 | 0.0423 | 65 | 1.3065 | 3436824 |
0.4452 | 0.0456 | 70 | 1.3265 | 3705312 |
0.4833 | 0.0488 | 75 | 1.2841 | 3972208 |
0.3188 | 0.0521 | 80 | 1.2756 | 4236168 |
0.4399 | 0.0553 | 85 | 1.2427 | 4506040 |
0.2031 | 0.0586 | 90 | 1.2521 | 4773952 |
0.2376 | 0.0618 | 95 | 1.2296 | 5039448 |
0.3043 | 0.0651 | 100 | 1.2130 | 5302264 |
0.2084 | 0.0683 | 105 | 1.2104 | 5568032 |
0.1706 | 0.0716 | 110 | 1.2057 | 5828784 |
0.1167 | 0.0748 | 115 | 1.2184 | 6088072 |
0.2035 | 0.0781 | 120 | 1.2022 | 6354992 |
0.1248 | 0.0813 | 125 | 1.1944 | 6619752 |
0.1686 | 0.0846 | 130 | 1.1900 | 6900152 |
0.1734 | 0.0879 | 135 | 1.1796 | 7166784 |
0.2504 | 0.0911 | 140 | 1.1876 | 7434096 |
0.1787 | 0.0944 | 145 | 1.1836 | 7701712 |
0.1778 | 0.0976 | 150 | 1.1859 | 7969280 |
0.2131 | 0.1009 | 155 | 1.1832 | 8227544 |
0.1817 | 0.1041 | 160 | 1.1765 | 8498776 |
0.1505 | 0.1074 | 165 | 1.1872 | 8771536 |
0.2212 | 0.1106 | 170 | 1.1807 | 9038696 |
0.198 | 0.1139 | 175 | 1.1772 | 9305096 |
0.279 | 0.1171 | 180 | 1.1768 | 9569536 |
0.1493 | 0.1204 | 185 | 1.1746 | 9833184 |
0.1759 | 0.1236 | 190 | 1.1772 | 10096112 |
0.1029 | 0.1269 | 195 | 1.1825 | 10362624 |
0.2585 | 0.1302 | 200 | 1.1764 | 10629720 |
0.1613 | 0.1334 | 205 | 1.1723 | 10895456 |
0.2008 | 0.1367 | 210 | 1.1662 | 11167648 |
0.2237 | 0.1399 | 215 | 1.1721 | 11437704 |
0.175 | 0.1432 | 220 | 1.1651 | 11702208 |
0.1262 | 0.1464 | 225 | 1.1691 | 11974768 |
0.1559 | 0.1497 | 230 | 1.1686 | 12247600 |
0.194 | 0.1529 | 235 | 1.1630 | 12516136 |
0.1474 | 0.1562 | 240 | 1.1629 | 12786536 |
0.1148 | 0.1594 | 245 | 1.1640 | 13050336 |
0.1983 | 0.1627 | 250 | 1.1613 | 13311416 |
0.2001 | 0.1659 | 255 | 1.1610 | 13576096 |
0.1472 | 0.1692 | 260 | 1.1599 | 13848752 |
0.1102 | 0.1725 | 265 | 1.1567 | 14118560 |
0.1402 | 0.1757 | 270 | 1.1558 | 14386304 |
0.2327 | 0.1790 | 275 | 1.1606 | 14654944 |
0.1188 | 0.1822 | 280 | 1.1555 | 14922264 |
0.1228 | 0.1855 | 285 | 1.1541 | 15182168 |
0.2055 | 0.1887 | 290 | 1.1579 | 15443080 |
0.1839 | 0.1920 | 295 | 1.1519 | 15707208 |
0.1708 | 0.1952 | 300 | 1.1511 | 15974568 |
0.163 | 0.1985 | 305 | 1.1561 | 16243976 |
0.1855 | 0.2017 | 310 | 1.1504 | 16495432 |
0.1612 | 0.2050 | 315 | 1.1489 | 16766592 |
0.2191 | 0.2082 | 320 | 1.1553 | 17039376 |
0.1202 | 0.2115 | 325 | 1.1533 | 17296016 |
0.1809 | 0.2148 | 330 | 1.1477 | 17558376 |
0.1508 | 0.2180 | 335 | 1.1469 | 17825504 |
0.1152 | 0.2213 | 340 | 1.1488 | 18087816 |
0.1798 | 0.2245 | 345 | 1.1507 | 18360816 |
0.1391 | 0.2278 | 350 | 1.1467 | 18619656 |
0.1681 | 0.2310 | 355 | 1.1442 | 18890032 |
0.2182 | 0.2343 | 360 | 1.1460 | 19157312 |
0.1626 | 0.2375 | 365 | 1.1449 | 19429696 |
0.0885 | 0.2408 | 370 | 1.1443 | 19700448 |
0.1827 | 0.2440 | 375 | 1.1432 | 19968800 |
0.1779 | 0.2473 | 380 | 1.1427 | 20226328 |
0.0995 | 0.2505 | 385 | 1.1428 | 20502312 |
0.1029 | 0.2538 | 390 | 1.1435 | 20768480 |
0.17 | 0.2571 | 395 | 1.1458 | 21036176 |
0.1192 | 0.2603 | 400 | 1.1426 | 21295648 |
0.1316 | 0.2636 | 405 | 1.1404 | 21560440 |
0.0988 | 0.2668 | 410 | 1.1437 | 21829680 |
0.1759 | 0.2701 | 415 | 1.1413 | 22101632 |
0.1986 | 0.2733 | 420 | 1.1407 | 22366368 |
0.146 | 0.2766 | 425 | 1.1395 | 22636200 |
0.1615 | 0.2798 | 430 | 1.1385 | 22905576 |
0.1157 | 0.2831 | 435 | 1.1386 | 23169240 |
0.0944 | 0.2863 | 440 | 1.1386 | 23443088 |
0.1098 | 0.2896 | 445 | 1.1375 | 23702392 |
0.1136 | 0.2928 | 450 | 1.1368 | 23972744 |
0.1649 | 0.2961 | 455 | 1.1385 | 24240808 |
0.1241 | 0.2994 | 460 | 1.1366 | 24507088 |
0.1541 | 0.3026 | 465 | 1.1367 | 24765232 |
0.1597 | 0.3059 | 470 | 1.1397 | 25025232 |
0.1398 | 0.3091 | 475 | 1.1345 | 25296992 |
0.1542 | 0.3124 | 480 | 1.1339 | 25559520 |
0.1057 | 0.3156 | 485 | 1.1366 | 25820352 |
0.1409 | 0.3189 | 490 | 1.1344 | 26081504 |
0.1122 | 0.3221 | 495 | 1.1322 | 26344776 |
0.1052 | 0.3254 | 500 | 1.1347 | 26607744 |
0.1698 | 0.3286 | 505 | 1.1330 | 26879144 |
0.0956 | 0.3319 | 510 | 1.1318 | 27154384 |
0.1905 | 0.3352 | 515 | 1.1337 | 27416648 |
0.1124 | 0.3384 | 520 | 1.1311 | 27687320 |
0.1642 | 0.3417 | 525 | 1.1299 | 27949920 |
0.1305 | 0.3449 | 530 | 1.1295 | 28214896 |
0.1315 | 0.3482 | 535 | 1.1291 | 28484216 |
0.1573 | 0.3514 | 540 | 1.1296 | 28749376 |
0.0567 | 0.3547 | 545 | 1.1285 | 29018536 |
0.1163 | 0.3579 | 550 | 1.1301 | 29285360 |
0.132 | 0.3612 | 555 | 1.1289 | 29553168 |
0.126 | 0.3644 | 560 | 1.1299 | 29811144 |
0.1196 | 0.3677 | 565 | 1.1288 | 30088048 |
0.1611 | 0.3709 | 570 | 1.1262 | 30356088 |
0.1403 | 0.3742 | 575 | 1.1271 | 30615816 |
0.0968 | 0.3775 | 580 | 1.1288 | 30885080 |
0.1202 | 0.3807 | 585 | 1.1253 | 31153848 |
0.1686 | 0.3840 | 590 | 1.1259 | 31414832 |
0.1229 | 0.3872 | 595 | 1.1266 | 31678080 |
0.1212 | 0.3905 | 600 | 1.1257 | 31947880 |
0.2659 | 0.3937 | 605 | 1.1240 | 32215184 |
0.1055 | 0.3970 | 610 | 1.1263 | 32485648 |
0.1644 | 0.4002 | 615 | 1.1275 | 32745216 |
0.1372 | 0.4035 | 620 | 1.1239 | 33012864 |
0.1472 | 0.4067 | 625 | 1.1241 | 33279520 |
0.0833 | 0.4100 | 630 | 1.1250 | 33545160 |
0.2207 | 0.4132 | 635 | 1.1274 | 33810552 |
0.2133 | 0.4165 | 640 | 1.1241 | 34077928 |
0.1568 | 0.4198 | 645 | 1.1243 | 34342640 |
0.1285 | 0.4230 | 650 | 1.1245 | 34606496 |
0.1163 | 0.4263 | 655 | 1.1214 | 34870984 |
0.0909 | 0.4295 | 660 | 1.1246 | 35135400 |
0.1945 | 0.4328 | 665 | 1.1251 | 35396560 |
0.1429 | 0.4360 | 670 | 1.1247 | 35667240 |
0.1404 | 0.4393 | 675 | 1.1221 | 35945384 |
0.1595 | 0.4425 | 680 | 1.1214 | 36209016 |
0.1285 | 0.4458 | 685 | 1.1231 | 36477200 |
0.1797 | 0.4490 | 690 | 1.1242 | 36743968 |
0.1422 | 0.4523 | 695 | 1.1219 | 37014584 |
0.1335 | 0.4555 | 700 | 1.1219 | 37290800 |
0.1539 | 0.4588 | 705 | 1.1226 | 37549952 |
0.0969 | 0.4621 | 710 | 1.1231 | 37817368 |
0.2443 | 0.4653 | 715 | 1.1226 | 38084992 |
0.0917 | 0.4686 | 720 | 1.1193 | 38358248 |
0.1035 | 0.4718 | 725 | 1.1213 | 38625976 |
0.0976 | 0.4751 | 730 | 1.1207 | 38898928 |
0.1655 | 0.4783 | 735 | 1.1199 | 39160128 |
0.0949 | 0.4816 | 740 | 1.1198 | 39429144 |
0.1677 | 0.4848 | 745 | 1.1231 | 39693112 |
0.1806 | 0.4881 | 750 | 1.1210 | 39960944 |
0.16 | 0.4913 | 755 | 1.1188 | 40230136 |
0.1157 | 0.4946 | 760 | 1.1214 | 40491144 |
0.1313 | 0.4978 | 765 | 1.1196 | 40753880 |
0.1462 | 0.5011 | 770 | 1.1166 | 41026168 |
0.1592 | 0.5044 | 775 | 1.1164 | 41288712 |
0.1199 | 0.5076 | 780 | 1.1174 | 41554064 |
0.1193 | 0.5109 | 785 | 1.1171 | 41820040 |
0.1928 | 0.5141 | 790 | 1.1172 | 42085464 |
0.0979 | 0.5174 | 795 | 1.1170 | 42344472 |
0.1666 | 0.5206 | 800 | 1.1165 | 42609232 |
0.146 | 0.5239 | 805 | 1.1153 | 42873448 |
0.1502 | 0.5271 | 810 | 1.1146 | 43137816 |
0.0969 | 0.5304 | 815 | 1.1146 | 43406072 |
0.1211 | 0.5336 | 820 | 1.1142 | 43670696 |
0.0908 | 0.5369 | 825 | 1.1166 | 43934728 |
0.1541 | 0.5401 | 830 | 1.1182 | 44207544 |
0.0879 | 0.5434 | 835 | 1.1146 | 44470576 |
0.1227 | 0.5467 | 840 | 1.1142 | 44737816 |
0.1182 | 0.5499 | 845 | 1.1170 | 45006744 |
0.1618 | 0.5532 | 850 | 1.1166 | 45280912 |
0.133 | 0.5564 | 855 | 1.1139 | 45542032 |
0.1583 | 0.5597 | 860 | 1.1138 | 45810160 |
0.1066 | 0.5629 | 865 | 1.1146 | 46077784 |
0.0858 | 0.5662 | 870 | 1.1127 | 46349096 |
0.1107 | 0.5694 | 875 | 1.1156 | 46620064 |
0.0955 | 0.5727 | 880 | 1.1173 | 46883912 |
0.1262 | 0.5759 | 885 | 1.1144 | 47151400 |
0.1642 | 0.5792 | 890 | 1.1118 | 47430352 |
0.0692 | 0.5824 | 895 | 1.1135 | 47702696 |
0.1267 | 0.5857 | 900 | 1.1138 | 47968280 |
0.0978 | 0.5890 | 905 | 1.1123 | 48231760 |
0.0737 | 0.5922 | 910 | 1.1116 | 48504336 |
0.1046 | 0.5955 | 915 | 1.1132 | 48769080 |
0.0829 | 0.5987 | 920 | 1.1132 | 49035728 |
0.1694 | 0.6020 | 925 | 1.1122 | 49305928 |
0.1441 | 0.6052 | 930 | 1.1114 | 49566416 |
0.1008 | 0.6085 | 935 | 1.1111 | 49836680 |
0.2088 | 0.6117 | 940 | 1.1134 | 50109664 |
0.1343 | 0.6150 | 945 | 1.1131 | 50369152 |
0.1041 | 0.6182 | 950 | 1.1105 | 50634160 |
0.1043 | 0.6215 | 955 | 1.1092 | 50900880 |
0.1133 | 0.6247 | 960 | 1.1115 | 51163352 |
0.1254 | 0.6280 | 965 | 1.1125 | 51429800 |
0.107 | 0.6313 | 970 | 1.1091 | 51692744 |
0.1573 | 0.6345 | 975 | 1.1085 | 51959248 |
0.1044 | 0.6378 | 980 | 1.1097 | 52228240 |
0.1719 | 0.6410 | 985 | 1.1122 | 52489352 |
0.1518 | 0.6443 | 990 | 1.1113 | 52750792 |
0.1603 | 0.6475 | 995 | 1.1100 | 53021176 |
0.1496 | 0.6508 | 1000 | 1.1092 | 53282712 |
0.0991 | 0.6540 | 1005 | 1.1095 | 53549720 |
0.1347 | 0.6573 | 1010 | 1.1092 | 53813912 |
0.1129 | 0.6605 | 1015 | 1.1115 | 54083112 |
0.1365 | 0.6638 | 1020 | 1.1106 | 54348784 |
0.1719 | 0.6670 | 1025 | 1.1084 | 54611728 |
0.1339 | 0.6703 | 1030 | 1.1089 | 54879128 |
0.137 | 0.6736 | 1035 | 1.1073 | 55147720 |
0.0942 | 0.6768 | 1040 | 1.1078 | 55411208 |
0.0969 | 0.6801 | 1045 | 1.1094 | 55675544 |
0.1505 | 0.6833 | 1050 | 1.1084 | 55941176 |
0.1282 | 0.6866 | 1055 | 1.1088 | 56214568 |
0.1255 | 0.6898 | 1060 | 1.1098 | 56483288 |
0.0578 | 0.6931 | 1065 | 1.1084 | 56758328 |
0.1255 | 0.6963 | 1070 | 1.1086 | 57018456 |
0.0923 | 0.6996 | 1075 | 1.1093 | 57290480 |
0.1425 | 0.7028 | 1080 | 1.1077 | 57558472 |
0.1501 | 0.7061 | 1085 | 1.1085 | 57822040 |
0.1375 | 0.7093 | 1090 | 1.1084 | 58084672 |
0.1549 | 0.7126 | 1095 | 1.1094 | 58350696 |
0.1054 | 0.7159 | 1100 | 1.1096 | 58615712 |
0.0586 | 0.7191 | 1105 | 1.1086 | 58882808 |
0.1456 | 0.7224 | 1110 | 1.1105 | 59153024 |
0.1518 | 0.7256 | 1115 | 1.1092 | 59425048 |
0.1138 | 0.7289 | 1120 | 1.1078 | 59686984 |
0.1323 | 0.7321 | 1125 | 1.1072 | 59951872 |
0.1165 | 0.7354 | 1130 | 1.1092 | 60219312 |
0.1918 | 0.7386 | 1135 | 1.1075 | 60487056 |
0.1141 | 0.7419 | 1140 | 1.1067 | 60746080 |
0.0912 | 0.7451 | 1145 | 1.1076 | 61016400 |
0.142 | 0.7484 | 1150 | 1.1091 | 61283408 |
0.1685 | 0.7516 | 1155 | 1.1094 | 61553128 |
0.1176 | 0.7549 | 1160 | 1.1094 | 61817656 |
0.119 | 0.7582 | 1165 | 1.1085 | 62081872 |
0.0915 | 0.7614 | 1170 | 1.1064 | 62353856 |
0.143 | 0.7647 | 1175 | 1.1051 | 62621048 |
0.1531 | 0.7679 | 1180 | 1.1056 | 62892216 |
0.1306 | 0.7712 | 1185 | 1.1064 | 63158152 |
0.0752 | 0.7744 | 1190 | 1.1069 | 63423488 |
0.1135 | 0.7777 | 1195 | 1.1069 | 63686408 |
0.1654 | 0.7809 | 1200 | 1.1061 | 63951800 |
0.201 | 0.7842 | 1205 | 1.1070 | 64216872 |
0.1692 | 0.7874 | 1210 | 1.1057 | 64489504 |
0.1172 | 0.7907 | 1215 | 1.1035 | 64741000 |
0.0893 | 0.7939 | 1220 | 1.1048 | 65006824 |
0.1399 | 0.7972 | 1225 | 1.1063 | 65278488 |
0.0599 | 0.8005 | 1230 | 1.1064 | 65550376 |
0.1122 | 0.8037 | 1235 | 1.1064 | 65824144 |
0.169 | 0.8070 | 1240 | 1.1048 | 66089040 |
0.1846 | 0.8102 | 1245 | 1.1042 | 66357976 |
0.1158 | 0.8135 | 1250 | 1.1052 | 66623304 |
0.2468 | 0.8167 | 1255 | 1.1053 | 66893616 |
0.1127 | 0.8200 | 1260 | 1.1035 | 67160720 |
0.2156 | 0.8232 | 1265 | 1.1030 | 67424792 |
0.1622 | 0.8265 | 1270 | 1.1024 | 67696816 |
0.1598 | 0.8297 | 1275 | 1.1028 | 67960880 |
0.1164 | 0.8330 | 1280 | 1.1037 | 68226704 |
0.1595 | 0.8362 | 1285 | 1.1066 | 68486232 |
0.1345 | 0.8395 | 1290 | 1.1038 | 68748880 |
0.1588 | 0.8428 | 1295 | 1.1016 | 69020168 |
0.0836 | 0.8460 | 1300 | 1.1019 | 69292952 |
0.0908 | 0.8493 | 1305 | 1.1045 | 69565912 |
0.1198 | 0.8525 | 1310 | 1.1054 | 69829336 |
0.1278 | 0.8558 | 1315 | 1.1023 | 70096944 |
0.1095 | 0.8590 | 1320 | 1.1027 | 70364432 |
0.1689 | 0.8623 | 1325 | 1.1010 | 70636928 |
0.1483 | 0.8655 | 1330 | 1.0997 | 70908544 |
0.0971 | 0.8688 | 1335 | 1.0995 | 71178552 |
0.101 | 0.8720 | 1340 | 1.1000 | 71432304 |
0.0869 | 0.8753 | 1345 | 1.1004 | 71698816 |
0.141 | 0.8785 | 1350 | 1.1025 | 71969264 |
0.1031 | 0.8818 | 1355 | 1.1020 | 72237288 |
0.1398 | 0.8851 | 1360 | 1.1023 | 72508376 |
0.1312 | 0.8883 | 1365 | 1.1015 | 72775864 |
0.129 | 0.8916 | 1370 | 1.1015 | 73043072 |
0.1795 | 0.8948 | 1375 | 1.1016 | 73312928 |
0.1095 | 0.8981 | 1380 | 1.1011 | 73581368 |
0.2005 | 0.9013 | 1385 | 1.1004 | 73849016 |
0.1172 | 0.9046 | 1390 | 1.1003 | 74117816 |
0.1112 | 0.9078 | 1395 | 1.1012 | 74376240 |
0.2351 | 0.9111 | 1400 | 1.1003 | 74646712 |
0.1025 | 0.9143 | 1405 | 1.0998 | 74913808 |
0.1194 | 0.9176 | 1410 | 1.1015 | 75179000 |
0.1385 | 0.9208 | 1415 | 1.1007 | 75446568 |
0.1154 | 0.9241 | 1420 | 1.0997 | 75713032 |
0.147 | 0.9274 | 1425 | 1.1009 | 75976168 |
0.1226 | 0.9306 | 1430 | 1.1028 | 76240016 |
0.1284 | 0.9339 | 1435 | 1.1010 | 76507960 |
0.1362 | 0.9371 | 1440 | 1.0989 | 76776128 |
0.0982 | 0.9404 | 1445 | 1.0994 | 77043384 |
0.1068 | 0.9436 | 1450 | 1.1003 | 77311904 |
0.1302 | 0.9469 | 1455 | 1.0989 | 77574864 |
0.1258 | 0.9501 | 1460 | 1.0980 | 77850528 |
0.1635 | 0.9534 | 1465 | 1.0979 | 78120520 |
0.1462 | 0.9566 | 1470 | 1.0969 | 78385664 |
0.1257 | 0.9599 | 1475 | 1.0974 | 78654280 |
0.128 | 0.9631 | 1480 | 1.0987 | 78923528 |
0.0815 | 0.9664 | 1485 | 1.0979 | 79191792 |
0.0533 | 0.9697 | 1490 | 1.0981 | 79457608 |
0.1541 | 0.9729 | 1495 | 1.0980 | 79724704 |
0.0982 | 0.9762 | 1500 | 1.0992 | 79981000 |
0.0984 | 0.9794 | 1505 | 1.0988 | 80239232 |
0.1648 | 0.9827 | 1510 | 1.0979 | 80512952 |
0.1408 | 0.9859 | 1515 | 1.0980 | 80776448 |
0.1086 | 0.9892 | 1520 | 1.0976 | 81044664 |
0.1387 | 0.9924 | 1525 | 1.0974 | 81310696 |
0.0848 | 0.9957 | 1530 | 1.0992 | 81575712 |
0.1866 | 0.9989 | 1535 | 1.1018 | 81843088 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter16_sftsd1
Base model
google/gemma-2-2b