metadata
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter13_sftsd1
results: []
collapse_gemma-2-2b_hs2_accumulate_iter13_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0988
- Num Input Tokens Seen: 66474032
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6496 | 0.0040 | 5 | 1.3888 | 269088 |
1.593 | 0.0080 | 10 | 1.3738 | 532768 |
1.6319 | 0.0121 | 15 | 1.3415 | 797616 |
1.4412 | 0.0161 | 20 | 1.2878 | 1059528 |
1.4212 | 0.0201 | 25 | 1.2452 | 1330056 |
1.3105 | 0.0241 | 30 | 1.2106 | 1595496 |
1.2004 | 0.0281 | 35 | 1.1870 | 1856096 |
1.122 | 0.0321 | 40 | 1.1980 | 2128096 |
0.9857 | 0.0362 | 45 | 1.2116 | 2397312 |
0.8123 | 0.0402 | 50 | 1.2458 | 2660648 |
0.6974 | 0.0442 | 55 | 1.2866 | 2923440 |
0.5779 | 0.0482 | 60 | 1.2544 | 3190904 |
0.6053 | 0.0522 | 65 | 1.2958 | 3466056 |
0.377 | 0.0562 | 70 | 1.2836 | 3738872 |
0.437 | 0.0603 | 75 | 1.2394 | 4008880 |
0.2844 | 0.0643 | 80 | 1.2326 | 4270216 |
0.2743 | 0.0683 | 85 | 1.2176 | 4534544 |
0.2454 | 0.0723 | 90 | 1.2031 | 4797656 |
0.3017 | 0.0763 | 95 | 1.2110 | 5064904 |
0.2919 | 0.0804 | 100 | 1.1901 | 5325960 |
0.2755 | 0.0844 | 105 | 1.1899 | 5588816 |
0.2508 | 0.0884 | 110 | 1.1932 | 5859792 |
0.2048 | 0.0924 | 115 | 1.1895 | 6124448 |
0.1805 | 0.0964 | 120 | 1.1991 | 6394440 |
0.2482 | 0.1004 | 125 | 1.1865 | 6660424 |
0.2114 | 0.1045 | 130 | 1.1828 | 6925280 |
0.2454 | 0.1085 | 135 | 1.1801 | 7192496 |
0.2305 | 0.1125 | 140 | 1.1733 | 7456696 |
0.1829 | 0.1165 | 145 | 1.1778 | 7723888 |
0.2417 | 0.1205 | 150 | 1.1796 | 7998624 |
0.1485 | 0.1245 | 155 | 1.1714 | 8271672 |
0.1433 | 0.1286 | 160 | 1.1770 | 8546408 |
0.2375 | 0.1326 | 165 | 1.1716 | 8816744 |
0.1699 | 0.1366 | 170 | 1.1698 | 9086496 |
0.1136 | 0.1406 | 175 | 1.1651 | 9346888 |
0.1336 | 0.1446 | 180 | 1.1702 | 9619312 |
0.1598 | 0.1487 | 185 | 1.1609 | 9885952 |
0.0921 | 0.1527 | 190 | 1.1622 | 10153872 |
0.2749 | 0.1567 | 195 | 1.1658 | 10421200 |
0.2119 | 0.1607 | 200 | 1.1574 | 10694680 |
0.2545 | 0.1647 | 205 | 1.1574 | 10966232 |
0.242 | 0.1687 | 210 | 1.1530 | 11232608 |
0.1785 | 0.1728 | 215 | 1.1555 | 11495504 |
0.2243 | 0.1768 | 220 | 1.1555 | 11761088 |
0.257 | 0.1808 | 225 | 1.1501 | 12034208 |
0.1593 | 0.1848 | 230 | 1.1525 | 12297864 |
0.2022 | 0.1888 | 235 | 1.1533 | 12565760 |
0.2072 | 0.1928 | 240 | 1.1519 | 12833240 |
0.1091 | 0.1969 | 245 | 1.1511 | 13102024 |
0.0845 | 0.2009 | 250 | 1.1520 | 13362240 |
0.2093 | 0.2049 | 255 | 1.1502 | 13625696 |
0.1741 | 0.2089 | 260 | 1.1467 | 13890168 |
0.1188 | 0.2129 | 265 | 1.1540 | 14154448 |
0.3031 | 0.2170 | 270 | 1.1497 | 14419648 |
0.1891 | 0.2210 | 275 | 1.1464 | 14674784 |
0.2016 | 0.2250 | 280 | 1.1447 | 14949488 |
0.1007 | 0.2290 | 285 | 1.1460 | 15214800 |
0.1779 | 0.2330 | 290 | 1.1475 | 15483240 |
0.195 | 0.2370 | 295 | 1.1398 | 15751536 |
0.2069 | 0.2411 | 300 | 1.1429 | 16014584 |
0.1597 | 0.2451 | 305 | 1.1420 | 16277072 |
0.111 | 0.2491 | 310 | 1.1397 | 16540864 |
0.107 | 0.2531 | 315 | 1.1423 | 16804568 |
0.1212 | 0.2571 | 320 | 1.1387 | 17077128 |
0.1412 | 0.2611 | 325 | 1.1382 | 17348320 |
0.1192 | 0.2652 | 330 | 1.1419 | 17612576 |
0.1879 | 0.2692 | 335 | 1.1388 | 17876784 |
0.1433 | 0.2732 | 340 | 1.1362 | 18142544 |
0.1748 | 0.2772 | 345 | 1.1411 | 18415672 |
0.1677 | 0.2812 | 350 | 1.1373 | 18683536 |
0.1358 | 0.2853 | 355 | 1.1346 | 18952888 |
0.1712 | 0.2893 | 360 | 1.1369 | 19218360 |
0.1619 | 0.2933 | 365 | 1.1386 | 19483840 |
0.1071 | 0.2973 | 370 | 1.1347 | 19756976 |
0.2192 | 0.3013 | 375 | 1.1322 | 20022776 |
0.1235 | 0.3053 | 380 | 1.1334 | 20289712 |
0.2287 | 0.3094 | 385 | 1.1345 | 20559104 |
0.1922 | 0.3134 | 390 | 1.1295 | 20823864 |
0.1379 | 0.3174 | 395 | 1.1306 | 21082544 |
0.109 | 0.3214 | 400 | 1.1325 | 21356280 |
0.1387 | 0.3254 | 405 | 1.1298 | 21630688 |
0.1094 | 0.3294 | 410 | 1.1290 | 21895440 |
0.1573 | 0.3335 | 415 | 1.1295 | 22163328 |
0.1252 | 0.3375 | 420 | 1.1275 | 22422360 |
0.1323 | 0.3415 | 425 | 1.1309 | 22693992 |
0.1553 | 0.3455 | 430 | 1.1275 | 22960416 |
0.0841 | 0.3495 | 435 | 1.1282 | 23224648 |
0.1479 | 0.3536 | 440 | 1.1303 | 23485960 |
0.1776 | 0.3576 | 445 | 1.1319 | 23757080 |
0.1108 | 0.3616 | 450 | 1.1295 | 24019992 |
0.1577 | 0.3656 | 455 | 1.1281 | 24283712 |
0.1419 | 0.3696 | 460 | 1.1281 | 24555736 |
0.1669 | 0.3736 | 465 | 1.1274 | 24819064 |
0.175 | 0.3777 | 470 | 1.1248 | 25091464 |
0.1287 | 0.3817 | 475 | 1.1257 | 25360944 |
0.1303 | 0.3857 | 480 | 1.1300 | 25627840 |
0.2149 | 0.3897 | 485 | 1.1238 | 25895920 |
0.1754 | 0.3937 | 490 | 1.1214 | 26159488 |
0.1381 | 0.3978 | 495 | 1.1240 | 26425400 |
0.1971 | 0.4018 | 500 | 1.1243 | 26695288 |
0.1112 | 0.4058 | 505 | 1.1231 | 26958128 |
0.1507 | 0.4098 | 510 | 1.1190 | 27224768 |
0.2245 | 0.4138 | 515 | 1.1196 | 27490376 |
0.1332 | 0.4178 | 520 | 1.1214 | 27759472 |
0.2522 | 0.4219 | 525 | 1.1237 | 28021432 |
0.1485 | 0.4259 | 530 | 1.1195 | 28293960 |
0.1108 | 0.4299 | 535 | 1.1196 | 28565520 |
0.1354 | 0.4339 | 540 | 1.1205 | 28830248 |
0.188 | 0.4379 | 545 | 1.1186 | 29098632 |
0.1505 | 0.4419 | 550 | 1.1169 | 29366008 |
0.2583 | 0.4460 | 555 | 1.1186 | 29631632 |
0.1734 | 0.4500 | 560 | 1.1181 | 29892432 |
0.1396 | 0.4540 | 565 | 1.1191 | 30155064 |
0.147 | 0.4580 | 570 | 1.1185 | 30425328 |
0.1781 | 0.4620 | 575 | 1.1157 | 30687912 |
0.087 | 0.4661 | 580 | 1.1194 | 30955536 |
0.1667 | 0.4701 | 585 | 1.1211 | 31223528 |
0.2041 | 0.4741 | 590 | 1.1164 | 31486616 |
0.1368 | 0.4781 | 595 | 1.1163 | 31756680 |
0.1193 | 0.4821 | 600 | 1.1166 | 32029360 |
0.1863 | 0.4861 | 605 | 1.1142 | 32300840 |
0.1692 | 0.4902 | 610 | 1.1145 | 32559992 |
0.1551 | 0.4942 | 615 | 1.1158 | 32820160 |
0.1233 | 0.4982 | 620 | 1.1139 | 33090856 |
0.2353 | 0.5022 | 625 | 1.1132 | 33356216 |
0.0917 | 0.5062 | 630 | 1.1161 | 33627544 |
0.1523 | 0.5102 | 635 | 1.1159 | 33898952 |
0.1818 | 0.5143 | 640 | 1.1135 | 34166040 |
0.0914 | 0.5183 | 645 | 1.1139 | 34432080 |
0.1609 | 0.5223 | 650 | 1.1142 | 34695128 |
0.1164 | 0.5263 | 655 | 1.1137 | 34960016 |
0.1476 | 0.5303 | 660 | 1.1127 | 35227024 |
0.1514 | 0.5344 | 665 | 1.1138 | 35502752 |
0.1921 | 0.5384 | 670 | 1.1135 | 35777480 |
0.1547 | 0.5424 | 675 | 1.1111 | 36051128 |
0.1647 | 0.5464 | 680 | 1.1128 | 36324632 |
0.1431 | 0.5504 | 685 | 1.1132 | 36599048 |
0.1537 | 0.5544 | 690 | 1.1113 | 36868312 |
0.1508 | 0.5585 | 695 | 1.1119 | 37137304 |
0.1446 | 0.5625 | 700 | 1.1121 | 37400984 |
0.1871 | 0.5665 | 705 | 1.1104 | 37670160 |
0.1148 | 0.5705 | 710 | 1.1093 | 37937456 |
0.1809 | 0.5745 | 715 | 1.1107 | 38213656 |
0.1562 | 0.5785 | 720 | 1.1134 | 38481208 |
0.1856 | 0.5826 | 725 | 1.1124 | 38748528 |
0.2117 | 0.5866 | 730 | 1.1110 | 39014688 |
0.1334 | 0.5906 | 735 | 1.1086 | 39285112 |
0.1282 | 0.5946 | 740 | 1.1083 | 39558336 |
0.1079 | 0.5986 | 745 | 1.1078 | 39816608 |
0.2084 | 0.6027 | 750 | 1.1080 | 40081864 |
0.1388 | 0.6067 | 755 | 1.1099 | 40349832 |
0.1496 | 0.6107 | 760 | 1.1095 | 40617056 |
0.123 | 0.6147 | 765 | 1.1066 | 40887032 |
0.0792 | 0.6187 | 770 | 1.1065 | 41148104 |
0.1639 | 0.6227 | 775 | 1.1086 | 41423424 |
0.2501 | 0.6268 | 780 | 1.1078 | 41700288 |
0.115 | 0.6308 | 785 | 1.1090 | 41971832 |
0.1738 | 0.6348 | 790 | 1.1083 | 42239944 |
0.1595 | 0.6388 | 795 | 1.1061 | 42497488 |
0.1121 | 0.6428 | 800 | 1.1059 | 42763824 |
0.1503 | 0.6468 | 805 | 1.1075 | 43033424 |
0.0887 | 0.6509 | 810 | 1.1048 | 43299520 |
0.1208 | 0.6549 | 815 | 1.1063 | 43567272 |
0.1165 | 0.6589 | 820 | 1.1090 | 43830216 |
0.136 | 0.6629 | 825 | 1.1080 | 44101312 |
0.1441 | 0.6669 | 830 | 1.1059 | 44372208 |
0.1372 | 0.6710 | 835 | 1.1074 | 44629960 |
0.0905 | 0.6750 | 840 | 1.1078 | 44894304 |
0.17 | 0.6790 | 845 | 1.1058 | 45163432 |
0.1861 | 0.6830 | 850 | 1.1047 | 45430264 |
0.1535 | 0.6870 | 855 | 1.1053 | 45705032 |
0.2079 | 0.6910 | 860 | 1.1057 | 45973272 |
0.1795 | 0.6951 | 865 | 1.1057 | 46238200 |
0.1819 | 0.6991 | 870 | 1.1061 | 46508080 |
0.1625 | 0.7031 | 875 | 1.1057 | 46775056 |
0.157 | 0.7071 | 880 | 1.1041 | 47045584 |
0.1586 | 0.7111 | 885 | 1.1041 | 47315400 |
0.1219 | 0.7151 | 890 | 1.1043 | 47581088 |
0.1534 | 0.7192 | 895 | 1.1045 | 47844512 |
0.1423 | 0.7232 | 900 | 1.1032 | 48114328 |
0.1358 | 0.7272 | 905 | 1.1040 | 48380520 |
0.127 | 0.7312 | 910 | 1.1042 | 48649872 |
0.1462 | 0.7352 | 915 | 1.1043 | 48920232 |
0.154 | 0.7393 | 920 | 1.1035 | 49186984 |
0.1847 | 0.7433 | 925 | 1.1041 | 49454928 |
0.1678 | 0.7473 | 930 | 1.1053 | 49722280 |
0.1658 | 0.7513 | 935 | 1.1050 | 49988024 |
0.1301 | 0.7553 | 940 | 1.1053 | 50255760 |
0.1239 | 0.7593 | 945 | 1.1044 | 50530080 |
0.1458 | 0.7634 | 950 | 1.1037 | 50792368 |
0.152 | 0.7674 | 955 | 1.1041 | 51052328 |
0.1736 | 0.7714 | 960 | 1.1041 | 51318808 |
0.1981 | 0.7754 | 965 | 1.1030 | 51586904 |
0.1032 | 0.7794 | 970 | 1.1021 | 51861168 |
0.1126 | 0.7834 | 975 | 1.1050 | 52129208 |
0.2006 | 0.7875 | 980 | 1.1045 | 52395312 |
0.2615 | 0.7915 | 985 | 1.1011 | 52661168 |
0.1574 | 0.7955 | 990 | 1.1013 | 52923160 |
0.183 | 0.7995 | 995 | 1.1067 | 53179296 |
0.1247 | 0.8035 | 1000 | 1.1045 | 53445496 |
0.136 | 0.8076 | 1005 | 1.1013 | 53714992 |
0.2123 | 0.8116 | 1010 | 1.1015 | 53973440 |
0.1449 | 0.8156 | 1015 | 1.1025 | 54238472 |
0.2289 | 0.8196 | 1020 | 1.1019 | 54508944 |
0.1454 | 0.8236 | 1025 | 1.1013 | 54782640 |
0.1422 | 0.8276 | 1030 | 1.1022 | 55052512 |
0.1588 | 0.8317 | 1035 | 1.1022 | 55320536 |
0.1174 | 0.8357 | 1040 | 1.1024 | 55587976 |
0.1778 | 0.8397 | 1045 | 1.1006 | 55850544 |
0.2064 | 0.8437 | 1050 | 1.1019 | 56111488 |
0.1348 | 0.8477 | 1055 | 1.1043 | 56379936 |
0.1454 | 0.8517 | 1060 | 1.1027 | 56633752 |
0.0895 | 0.8558 | 1065 | 1.0997 | 56900624 |
0.1199 | 0.8598 | 1070 | 1.1008 | 57165704 |
0.1866 | 0.8638 | 1075 | 1.1013 | 57431640 |
0.1512 | 0.8678 | 1080 | 1.1002 | 57697040 |
0.1935 | 0.8718 | 1085 | 1.1003 | 57971200 |
0.1479 | 0.8759 | 1090 | 1.1003 | 58235216 |
0.1603 | 0.8799 | 1095 | 1.1010 | 58505320 |
0.1545 | 0.8839 | 1100 | 1.1004 | 58781952 |
0.1349 | 0.8879 | 1105 | 1.0978 | 59054312 |
0.1038 | 0.8919 | 1110 | 1.0981 | 59316192 |
0.2127 | 0.8959 | 1115 | 1.0985 | 59576760 |
0.2207 | 0.9000 | 1120 | 1.0978 | 59841800 |
0.1447 | 0.9040 | 1125 | 1.0980 | 60108152 |
0.1445 | 0.9080 | 1130 | 1.0986 | 60381688 |
0.123 | 0.9120 | 1135 | 1.0985 | 60644416 |
0.1337 | 0.9160 | 1140 | 1.0972 | 60914960 |
0.1519 | 0.9200 | 1145 | 1.0964 | 61189320 |
0.1618 | 0.9241 | 1150 | 1.0997 | 61451944 |
0.1586 | 0.9281 | 1155 | 1.1000 | 61715960 |
0.1538 | 0.9321 | 1160 | 1.0981 | 61986840 |
0.0929 | 0.9361 | 1165 | 1.0972 | 62255312 |
0.1543 | 0.9401 | 1170 | 1.0973 | 62523592 |
0.1406 | 0.9442 | 1175 | 1.0976 | 62795320 |
0.1527 | 0.9482 | 1180 | 1.0970 | 63061184 |
0.1556 | 0.9522 | 1185 | 1.0975 | 63326856 |
0.2417 | 0.9562 | 1190 | 1.0983 | 63598528 |
0.1064 | 0.9602 | 1195 | 1.1001 | 63861592 |
0.1908 | 0.9642 | 1200 | 1.0971 | 64129760 |
0.1303 | 0.9683 | 1205 | 1.0958 | 64399112 |
0.1397 | 0.9723 | 1210 | 1.0972 | 64666312 |
0.1802 | 0.9763 | 1215 | 1.0971 | 64938056 |
0.1478 | 0.9803 | 1220 | 1.0970 | 65198400 |
0.1511 | 0.9843 | 1225 | 1.0966 | 65460480 |
0.1352 | 0.9883 | 1230 | 1.0973 | 65730520 |
0.1681 | 0.9924 | 1235 | 1.0983 | 65993712 |
0.1158 | 0.9964 | 1240 | 1.0982 | 66264848 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1