--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd2 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd2 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1033 - Num Input Tokens Seen: 46879648 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 2 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.5753 | 0.0058 | 5 | 1.3923 | 267568 | | 1.53 | 0.0116 | 10 | 1.3634 | 543056 | | 1.5557 | 0.0175 | 15 | 1.2991 | 818656 | | 1.3745 | 0.0233 | 20 | 1.2495 | 1083296 | | 1.37 | 0.0291 | 25 | 1.1978 | 1354864 | | 1.2347 | 0.0349 | 30 | 1.1741 | 1634200 | | 1.1829 | 0.0407 | 35 | 1.1812 | 1901200 | | 1.0397 | 0.0465 | 40 | 1.1747 | 2171536 | | 0.9146 | 0.0524 | 45 | 1.2084 | 2450072 | | 0.7423 | 0.0582 | 50 | 1.2263 | 2727648 | | 0.7049 | 0.0640 | 55 | 1.2532 | 3010920 | | 0.6766 | 0.0698 | 60 | 1.2553 | 3293520 | | 0.7333 | 0.0756 | 65 | 1.2337 | 3566112 | | 0.5088 | 0.0814 | 70 | 1.2231 | 3841224 | | 0.4615 | 0.0873 | 75 | 1.2312 | 4116112 | | 0.4342 | 0.0931 | 80 | 1.2311 | 4390024 | | 0.3359 | 0.0989 | 85 | 1.2199 | 4662072 | | 0.3926 | 0.1047 | 90 | 1.2103 | 4931912 | | 0.4366 | 0.1105 | 95 | 1.2067 | 5212056 | | 0.3526 | 0.1163 | 100 | 1.2125 | 5482184 | | 0.286 | 0.1222 | 105 | 1.2038 | 5743432 | | 0.3501 | 0.1280 | 110 | 1.2010 | 6021160 | | 0.3396 | 0.1338 | 115 | 1.2046 | 6298096 | | 0.2977 | 0.1396 | 120 | 1.1984 | 6567624 | | 0.2274 | 0.1454 | 125 | 1.1953 | 6842616 | | 0.2313 | 0.1513 | 130 | 1.1938 | 7116480 | | 0.2709 | 0.1571 | 135 | 1.1894 | 7391656 | | 0.284 | 0.1629 | 140 | 1.1916 | 7658384 | | 0.3073 | 0.1687 | 145 | 1.1820 | 7939104 | | 0.2237 | 0.1745 | 150 | 1.1919 | 8210096 | | 0.2312 | 0.1803 | 155 | 1.1805 | 8481048 | | 0.3128 | 0.1862 | 160 | 1.1837 | 8758024 | | 0.2945 | 0.1920 | 165 | 1.1817 | 9027632 | | 0.2817 | 0.1978 | 170 | 1.1737 | 9300448 | | 0.356 | 0.2036 | 175 | 1.1749 | 9578488 | | 0.2954 | 0.2094 | 180 | 1.1715 | 9851304 | | 0.3045 | 0.2152 | 185 | 1.1691 | 10115416 | | 0.282 | 0.2211 | 190 | 1.1678 | 10382944 | | 0.3053 | 0.2269 | 195 | 1.1697 | 10660152 | | 0.2065 | 0.2327 | 200 | 1.1665 | 10940720 | | 0.2118 | 0.2385 | 205 | 1.1648 | 11221464 | | 0.2133 | 0.2443 | 210 | 1.1659 | 11497384 | | 0.2162 | 0.2501 | 215 | 1.1653 | 11769728 | | 0.2568 | 0.2560 | 220 | 1.1634 | 12048248 | | 0.2813 | 0.2618 | 225 | 1.1619 | 12315600 | | 0.2439 | 0.2676 | 230 | 1.1567 | 12588160 | | 0.1679 | 0.2734 | 235 | 1.1618 | 12863192 | | 0.2016 | 0.2792 | 240 | 1.1594 | 13130656 | | 0.2964 | 0.2850 | 245 | 1.1580 | 13400608 | | 0.1561 | 0.2909 | 250 | 1.1574 | 13668440 | | 0.219 | 0.2967 | 255 | 1.1554 | 13943704 | | 0.2607 | 0.3025 | 260 | 1.1536 | 14221768 | | 0.2848 | 0.3083 | 265 | 1.1554 | 14492304 | | 0.2455 | 0.3141 | 270 | 1.1531 | 14760848 | | 0.372 | 0.3200 | 275 | 1.1542 | 15035936 | | 0.2095 | 0.3258 | 280 | 1.1520 | 15310576 | | 0.2474 | 0.3316 | 285 | 1.1532 | 15579504 | | 0.3264 | 0.3374 | 290 | 1.1465 | 15854256 | | 0.1844 | 0.3432 | 295 | 1.1523 | 16128872 | | 0.1632 | 0.3490 | 300 | 1.1505 | 16399592 | | 0.2669 | 0.3549 | 305 | 1.1456 | 16667320 | | 0.2193 | 0.3607 | 310 | 1.1474 | 16941416 | | 0.1967 | 0.3665 | 315 | 1.1459 | 17212144 | | 0.2129 | 0.3723 | 320 | 1.1443 | 17482792 | | 0.3056 | 0.3781 | 325 | 1.1444 | 17763040 | | 0.1587 | 0.3839 | 330 | 1.1409 | 18032152 | | 0.1836 | 0.3898 | 335 | 1.1407 | 18299920 | | 0.2388 | 0.3956 | 340 | 1.1384 | 18577344 | | 0.2204 | 0.4014 | 345 | 1.1370 | 18840160 | | 0.1834 | 0.4072 | 350 | 1.1409 | 19112960 | | 0.2406 | 0.4130 | 355 | 1.1363 | 19385312 | | 0.2043 | 0.4188 | 360 | 1.1364 | 19661376 | | 0.1834 | 0.4247 | 365 | 1.1376 | 19935920 | | 0.2579 | 0.4305 | 370 | 1.1363 | 20210320 | | 0.2246 | 0.4363 | 375 | 1.1345 | 20477424 | | 0.2203 | 0.4421 | 380 | 1.1359 | 20750464 | | 0.2124 | 0.4479 | 385 | 1.1362 | 21020688 | | 0.2741 | 0.4538 | 390 | 1.1334 | 21291056 | | 0.1375 | 0.4596 | 395 | 1.1361 | 21566192 | | 0.1435 | 0.4654 | 400 | 1.1363 | 21843896 | | 0.2614 | 0.4712 | 405 | 1.1319 | 22105576 | | 0.2487 | 0.4770 | 410 | 1.1331 | 22375904 | | 0.2255 | 0.4828 | 415 | 1.1321 | 22645976 | | 0.161 | 0.4887 | 420 | 1.1329 | 22915392 | | 0.217 | 0.4945 | 425 | 1.1313 | 23187664 | | 0.2353 | 0.5003 | 430 | 1.1311 | 23465448 | | 0.2315 | 0.5061 | 435 | 1.1310 | 23746544 | | 0.2228 | 0.5119 | 440 | 1.1315 | 24018896 | | 0.1554 | 0.5177 | 445 | 1.1276 | 24289048 | | 0.1983 | 0.5236 | 450 | 1.1295 | 24556440 | | 0.3362 | 0.5294 | 455 | 1.1269 | 24830568 | | 0.2744 | 0.5352 | 460 | 1.1263 | 25101672 | | 0.2374 | 0.5410 | 465 | 1.1283 | 25372920 | | 0.1861 | 0.5468 | 470 | 1.1260 | 25648208 | | 0.1935 | 0.5526 | 475 | 1.1257 | 25923920 | | 0.3554 | 0.5585 | 480 | 1.1256 | 26202440 | | 0.3118 | 0.5643 | 485 | 1.1234 | 26474632 | | 0.2162 | 0.5701 | 490 | 1.1243 | 26746064 | | 0.1809 | 0.5759 | 495 | 1.1244 | 27014568 | | 0.221 | 0.5817 | 500 | 1.1214 | 27293400 | | 0.2503 | 0.5876 | 505 | 1.1231 | 27562984 | | 0.237 | 0.5934 | 510 | 1.1232 | 27839408 | | 0.2327 | 0.5992 | 515 | 1.1184 | 28107568 | | 0.1367 | 0.6050 | 520 | 1.1217 | 28381536 | | 0.1865 | 0.6108 | 525 | 1.1262 | 28652160 | | 0.1721 | 0.6166 | 530 | 1.1182 | 28928688 | | 0.2373 | 0.6225 | 535 | 1.1192 | 29202088 | | 0.1933 | 0.6283 | 540 | 1.1219 | 29470424 | | 0.165 | 0.6341 | 545 | 1.1203 | 29741536 | | 0.1975 | 0.6399 | 550 | 1.1187 | 30015232 | | 0.2275 | 0.6457 | 555 | 1.1191 | 30287272 | | 0.1997 | 0.6515 | 560 | 1.1204 | 30560976 | | 0.0949 | 0.6574 | 565 | 1.1190 | 30838424 | | 0.2994 | 0.6632 | 570 | 1.1186 | 31112016 | | 0.1676 | 0.6690 | 575 | 1.1179 | 31379672 | | 0.1973 | 0.6748 | 580 | 1.1187 | 31650192 | | 0.1578 | 0.6806 | 585 | 1.1179 | 31918136 | | 0.2202 | 0.6864 | 590 | 1.1159 | 32195280 | | 0.1907 | 0.6923 | 595 | 1.1171 | 32471856 | | 0.2151 | 0.6981 | 600 | 1.1173 | 32736864 | | 0.1895 | 0.7039 | 605 | 1.1154 | 33013704 | | 0.2138 | 0.7097 | 610 | 1.1153 | 33286536 | | 0.1855 | 0.7155 | 615 | 1.1178 | 33560632 | | 0.1635 | 0.7213 | 620 | 1.1146 | 33829520 | | 0.2052 | 0.7272 | 625 | 1.1126 | 34108304 | | 0.1611 | 0.7330 | 630 | 1.1143 | 34384344 | | 0.2346 | 0.7388 | 635 | 1.1138 | 34660216 | | 0.176 | 0.7446 | 640 | 1.1133 | 34929120 | | 0.1957 | 0.7504 | 645 | 1.1141 | 35202480 | | 0.1893 | 0.7563 | 650 | 1.1117 | 35469120 | | 0.1599 | 0.7621 | 655 | 1.1157 | 35734824 | | 0.2146 | 0.7679 | 660 | 1.1164 | 36006752 | | 0.2293 | 0.7737 | 665 | 1.1133 | 36281976 | | 0.1527 | 0.7795 | 670 | 1.1120 | 36560080 | | 0.2942 | 0.7853 | 675 | 1.1121 | 36836336 | | 0.2387 | 0.7912 | 680 | 1.1120 | 37111576 | | 0.1984 | 0.7970 | 685 | 1.1114 | 37380104 | | 0.1399 | 0.8028 | 690 | 1.1105 | 37646488 | | 0.2481 | 0.8086 | 695 | 1.1136 | 37917360 | | 0.1596 | 0.8144 | 700 | 1.1121 | 38194064 | | 0.1548 | 0.8202 | 705 | 1.1091 | 38471880 | | 0.1167 | 0.8261 | 710 | 1.1109 | 38736384 | | 0.1977 | 0.8319 | 715 | 1.1099 | 39014416 | | 0.1793 | 0.8377 | 720 | 1.1093 | 39283968 | | 0.2611 | 0.8435 | 725 | 1.1096 | 39550808 | | 0.1204 | 0.8493 | 730 | 1.1105 | 39819976 | | 0.1484 | 0.8551 | 735 | 1.1111 | 40093784 | | 0.184 | 0.8610 | 740 | 1.1108 | 40358800 | | 0.2508 | 0.8668 | 745 | 1.1082 | 40632536 | | 0.2075 | 0.8726 | 750 | 1.1107 | 40908352 | | 0.1716 | 0.8784 | 755 | 1.1105 | 41185296 | | 0.1733 | 0.8842 | 760 | 1.1067 | 41452552 | | 0.2739 | 0.8901 | 765 | 1.1073 | 41734536 | | 0.1719 | 0.8959 | 770 | 1.1073 | 42009176 | | 0.2115 | 0.9017 | 775 | 1.1064 | 42278528 | | 0.2295 | 0.9075 | 780 | 1.1065 | 42552496 | | 0.2089 | 0.9133 | 785 | 1.1067 | 42828792 | | 0.2411 | 0.9191 | 790 | 1.1046 | 43102792 | | 0.1477 | 0.9250 | 795 | 1.1053 | 43381752 | | 0.1934 | 0.9308 | 800 | 1.1065 | 43654696 | | 0.1997 | 0.9366 | 805 | 1.1042 | 43928712 | | 0.1535 | 0.9424 | 810 | 1.1038 | 44198760 | | 0.2383 | 0.9482 | 815 | 1.1043 | 44473736 | | 0.1897 | 0.9540 | 820 | 1.1049 | 44754496 | | 0.1269 | 0.9599 | 825 | 1.1099 | 45023936 | | 0.2393 | 0.9657 | 830 | 1.1065 | 45291608 | | 0.2525 | 0.9715 | 835 | 1.1030 | 45563616 | | 0.1696 | 0.9773 | 840 | 1.1062 | 45834224 | | 0.1194 | 0.9831 | 845 | 1.1057 | 46108392 | | 0.1984 | 0.9889 | 850 | 1.1030 | 46384976 | | 0.2457 | 0.9948 | 855 | 1.1030 | 46660760 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1