--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1156 - Num Input Tokens Seen: 38394432 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 1 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.64 | 0.0071 | 5 | 1.3915 | 282928 | | 1.717 | 0.0142 | 10 | 1.3495 | 547680 | | 1.4756 | 0.0214 | 15 | 1.2809 | 819464 | | 1.3413 | 0.0285 | 20 | 1.2255 | 1088176 | | 1.2434 | 0.0356 | 25 | 1.1810 | 1359440 | | 1.2176 | 0.0427 | 30 | 1.1672 | 1625784 | | 1.2541 | 0.0499 | 35 | 1.1491 | 1899896 | | 0.9819 | 0.0570 | 40 | 1.1533 | 2176760 | | 0.947 | 0.0641 | 45 | 1.1622 | 2458784 | | 0.8886 | 0.0712 | 50 | 1.1769 | 2731336 | | 0.7859 | 0.0784 | 55 | 1.2131 | 3004608 | | 0.7724 | 0.0855 | 60 | 1.2111 | 3276648 | | 0.8257 | 0.0926 | 65 | 1.2124 | 3552744 | | 0.7196 | 0.0997 | 70 | 1.2153 | 3828616 | | 0.7089 | 0.1068 | 75 | 1.2123 | 4108840 | | 0.7354 | 0.1140 | 80 | 1.2026 | 4391920 | | 0.6275 | 0.1211 | 85 | 1.2205 | 4674200 | | 0.5129 | 0.1282 | 90 | 1.2144 | 4945712 | | 0.4506 | 0.1353 | 95 | 1.2009 | 5214520 | | 0.5107 | 0.1425 | 100 | 1.2186 | 5484592 | | 0.4638 | 0.1496 | 105 | 1.2054 | 5752320 | | 0.4786 | 0.1567 | 110 | 1.2011 | 6028136 | | 0.5751 | 0.1638 | 115 | 1.2009 | 6304032 | | 0.4034 | 0.1710 | 120 | 1.2037 | 6579840 | | 0.3894 | 0.1781 | 125 | 1.1952 | 6855056 | | 0.4096 | 0.1852 | 130 | 1.1990 | 7132912 | | 0.486 | 0.1923 | 135 | 1.1961 | 7401704 | | 0.3722 | 0.1994 | 140 | 1.1943 | 7674144 | | 0.3758 | 0.2066 | 145 | 1.1971 | 7955296 | | 0.3871 | 0.2137 | 150 | 1.1955 | 8232712 | | 0.3788 | 0.2208 | 155 | 1.1905 | 8504176 | | 0.3235 | 0.2279 | 160 | 1.1879 | 8779072 | | 0.3315 | 0.2351 | 165 | 1.1902 | 9059672 | | 0.328 | 0.2422 | 170 | 1.1905 | 9336368 | | 0.3476 | 0.2493 | 175 | 1.1880 | 9601712 | | 0.2789 | 0.2564 | 180 | 1.1829 | 9871144 | | 0.2937 | 0.2636 | 185 | 1.1835 | 10137584 | | 0.3359 | 0.2707 | 190 | 1.1815 | 10406656 | | 0.3616 | 0.2778 | 195 | 1.1803 | 10677608 | | 0.3162 | 0.2849 | 200 | 1.1794 | 10948264 | | 0.3174 | 0.2920 | 205 | 1.1750 | 11218000 | | 0.2904 | 0.2992 | 210 | 1.1806 | 11498160 | | 0.3929 | 0.3063 | 215 | 1.1692 | 11779608 | | 0.2965 | 0.3134 | 220 | 1.1731 | 12049808 | | 0.4205 | 0.3205 | 225 | 1.1692 | 12326136 | | 0.2849 | 0.3277 | 230 | 1.1736 | 12596680 | | 0.3107 | 0.3348 | 235 | 1.1665 | 12869960 | | 0.2267 | 0.3419 | 240 | 1.1724 | 13145648 | | 0.2392 | 0.3490 | 245 | 1.1708 | 13415312 | | 0.1885 | 0.3562 | 250 | 1.1657 | 13690584 | | 0.2722 | 0.3633 | 255 | 1.1676 | 13968448 | | 0.2161 | 0.3704 | 260 | 1.1651 | 14239944 | | 0.1734 | 0.3775 | 265 | 1.1659 | 14510952 | | 0.3554 | 0.3846 | 270 | 1.1580 | 14780912 | | 0.316 | 0.3918 | 275 | 1.1608 | 15055568 | | 0.2742 | 0.3989 | 280 | 1.1562 | 15334424 | | 0.1887 | 0.4060 | 285 | 1.1580 | 15606264 | | 0.3007 | 0.4131 | 290 | 1.1570 | 15876168 | | 0.1913 | 0.4203 | 295 | 1.1507 | 16146352 | | 0.2763 | 0.4274 | 300 | 1.1523 | 16420864 | | 0.3037 | 0.4345 | 305 | 1.1499 | 16693096 | | 0.1839 | 0.4416 | 310 | 1.1526 | 16976408 | | 0.2314 | 0.4488 | 315 | 1.1499 | 17252728 | | 0.2425 | 0.4559 | 320 | 1.1526 | 17521216 | | 0.2362 | 0.4630 | 325 | 1.1487 | 17788696 | | 0.2139 | 0.4701 | 330 | 1.1502 | 18057744 | | 0.2801 | 0.4773 | 335 | 1.1443 | 18332304 | | 0.3707 | 0.4844 | 340 | 1.1458 | 18610592 | | 0.2548 | 0.4915 | 345 | 1.1450 | 18881784 | | 0.2455 | 0.4986 | 350 | 1.1418 | 19146128 | | 0.2278 | 0.5057 | 355 | 1.1452 | 19420384 | | 0.2771 | 0.5129 | 360 | 1.1420 | 19696584 | | 0.2731 | 0.5200 | 365 | 1.1394 | 19967720 | | 0.219 | 0.5271 | 370 | 1.1415 | 20241272 | | 0.2432 | 0.5342 | 375 | 1.1457 | 20514896 | | 0.1841 | 0.5414 | 380 | 1.1429 | 20779312 | | 0.2617 | 0.5485 | 385 | 1.1404 | 21056016 | | 0.2928 | 0.5556 | 390 | 1.1404 | 21327080 | | 0.1952 | 0.5627 | 395 | 1.1354 | 21598992 | | 0.227 | 0.5699 | 400 | 1.1381 | 21877208 | | 0.2218 | 0.5770 | 405 | 1.1380 | 22149176 | | 0.1683 | 0.5841 | 410 | 1.1375 | 22423056 | | 0.3227 | 0.5912 | 415 | 1.1348 | 22693424 | | 0.3058 | 0.5983 | 420 | 1.1357 | 22966920 | | 0.1881 | 0.6055 | 425 | 1.1341 | 23246936 | | 0.2359 | 0.6126 | 430 | 1.1314 | 23522192 | | 0.2074 | 0.6197 | 435 | 1.1307 | 23801944 | | 0.2584 | 0.6268 | 440 | 1.1328 | 24074328 | | 0.2027 | 0.6340 | 445 | 1.1289 | 24348328 | | 0.2897 | 0.6411 | 450 | 1.1305 | 24623816 | | 0.2167 | 0.6482 | 455 | 1.1309 | 24902928 | | 0.3028 | 0.6553 | 460 | 1.1306 | 25174984 | | 0.2939 | 0.6625 | 465 | 1.1287 | 25447728 | | 0.2679 | 0.6696 | 470 | 1.1262 | 25716008 | | 0.3617 | 0.6767 | 475 | 1.1275 | 25994912 | | 0.3261 | 0.6838 | 480 | 1.1266 | 26270048 | | 0.2113 | 0.6909 | 485 | 1.1270 | 26541616 | | 0.3059 | 0.6981 | 490 | 1.1287 | 26818200 | | 0.2356 | 0.7052 | 495 | 1.1242 | 27087272 | | 0.2931 | 0.7123 | 500 | 1.1246 | 27359208 | | 0.2421 | 0.7194 | 505 | 1.1233 | 27638688 | | 0.2792 | 0.7266 | 510 | 1.1252 | 27911800 | | 0.2415 | 0.7337 | 515 | 1.1214 | 28186904 | | 0.292 | 0.7408 | 520 | 1.1222 | 28462520 | | 0.2697 | 0.7479 | 525 | 1.1214 | 28740360 | | 0.2745 | 0.7551 | 530 | 1.1196 | 29013592 | | 0.2365 | 0.7622 | 535 | 1.1221 | 29285096 | | 0.2456 | 0.7693 | 540 | 1.1199 | 29557536 | | 0.2182 | 0.7764 | 545 | 1.1208 | 29835096 | | 0.3136 | 0.7835 | 550 | 1.1219 | 30112088 | | 0.184 | 0.7907 | 555 | 1.1167 | 30387312 | | 0.2508 | 0.7978 | 560 | 1.1200 | 30659104 | | 0.2854 | 0.8049 | 565 | 1.1208 | 30939024 | | 0.2423 | 0.8120 | 570 | 1.1186 | 31214856 | | 0.3061 | 0.8192 | 575 | 1.1174 | 31487176 | | 0.2599 | 0.8263 | 580 | 1.1176 | 31758936 | | 0.1641 | 0.8334 | 585 | 1.1192 | 32029768 | | 0.3293 | 0.8405 | 590 | 1.1180 | 32306824 | | 0.1687 | 0.8477 | 595 | 1.1187 | 32583424 | | 0.2466 | 0.8548 | 600 | 1.1157 | 32855528 | | 0.2684 | 0.8619 | 605 | 1.1151 | 33131344 | | 0.2623 | 0.8690 | 610 | 1.1156 | 33412888 | | 0.3949 | 0.8761 | 615 | 1.1167 | 33688992 | | 0.2317 | 0.8833 | 620 | 1.1167 | 33963096 | | 0.2483 | 0.8904 | 625 | 1.1147 | 34243336 | | 0.3731 | 0.8975 | 630 | 1.1142 | 34521472 | | 0.2577 | 0.9046 | 635 | 1.1143 | 34794832 | | 0.2225 | 0.9118 | 640 | 1.1139 | 35064072 | | 0.1567 | 0.9189 | 645 | 1.1146 | 35342008 | | 0.3207 | 0.9260 | 650 | 1.1146 | 35610720 | | 0.1626 | 0.9331 | 655 | 1.1153 | 35880752 | | 0.2122 | 0.9403 | 660 | 1.1138 | 36156864 | | 0.2865 | 0.9474 | 665 | 1.1110 | 36433816 | | 0.2319 | 0.9545 | 670 | 1.1134 | 36713952 | | 0.1696 | 0.9616 | 675 | 1.1129 | 36980552 | | 0.2326 | 0.9687 | 680 | 1.1120 | 37256536 | | 0.2783 | 0.9759 | 685 | 1.1133 | 37524184 | | 0.2046 | 0.9830 | 690 | 1.1113 | 37805352 | | 0.2798 | 0.9901 | 695 | 1.1119 | 38079104 | | 0.2794 | 0.9972 | 700 | 1.1159 | 38340280 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1