Edit model card

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1156
  • Num Input Tokens Seen: 38394432

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.64 0.0071 5 1.3915 282928
1.717 0.0142 10 1.3495 547680
1.4756 0.0214 15 1.2809 819464
1.3413 0.0285 20 1.2255 1088176
1.2434 0.0356 25 1.1810 1359440
1.2176 0.0427 30 1.1672 1625784
1.2541 0.0499 35 1.1491 1899896
0.9819 0.0570 40 1.1533 2176760
0.947 0.0641 45 1.1622 2458784
0.8886 0.0712 50 1.1769 2731336
0.7859 0.0784 55 1.2131 3004608
0.7724 0.0855 60 1.2111 3276648
0.8257 0.0926 65 1.2124 3552744
0.7196 0.0997 70 1.2153 3828616
0.7089 0.1068 75 1.2123 4108840
0.7354 0.1140 80 1.2026 4391920
0.6275 0.1211 85 1.2205 4674200
0.5129 0.1282 90 1.2144 4945712
0.4506 0.1353 95 1.2009 5214520
0.5107 0.1425 100 1.2186 5484592
0.4638 0.1496 105 1.2054 5752320
0.4786 0.1567 110 1.2011 6028136
0.5751 0.1638 115 1.2009 6304032
0.4034 0.1710 120 1.2037 6579840
0.3894 0.1781 125 1.1952 6855056
0.4096 0.1852 130 1.1990 7132912
0.486 0.1923 135 1.1961 7401704
0.3722 0.1994 140 1.1943 7674144
0.3758 0.2066 145 1.1971 7955296
0.3871 0.2137 150 1.1955 8232712
0.3788 0.2208 155 1.1905 8504176
0.3235 0.2279 160 1.1879 8779072
0.3315 0.2351 165 1.1902 9059672
0.328 0.2422 170 1.1905 9336368
0.3476 0.2493 175 1.1880 9601712
0.2789 0.2564 180 1.1829 9871144
0.2937 0.2636 185 1.1835 10137584
0.3359 0.2707 190 1.1815 10406656
0.3616 0.2778 195 1.1803 10677608
0.3162 0.2849 200 1.1794 10948264
0.3174 0.2920 205 1.1750 11218000
0.2904 0.2992 210 1.1806 11498160
0.3929 0.3063 215 1.1692 11779608
0.2965 0.3134 220 1.1731 12049808
0.4205 0.3205 225 1.1692 12326136
0.2849 0.3277 230 1.1736 12596680
0.3107 0.3348 235 1.1665 12869960
0.2267 0.3419 240 1.1724 13145648
0.2392 0.3490 245 1.1708 13415312
0.1885 0.3562 250 1.1657 13690584
0.2722 0.3633 255 1.1676 13968448
0.2161 0.3704 260 1.1651 14239944
0.1734 0.3775 265 1.1659 14510952
0.3554 0.3846 270 1.1580 14780912
0.316 0.3918 275 1.1608 15055568
0.2742 0.3989 280 1.1562 15334424
0.1887 0.4060 285 1.1580 15606264
0.3007 0.4131 290 1.1570 15876168
0.1913 0.4203 295 1.1507 16146352
0.2763 0.4274 300 1.1523 16420864
0.3037 0.4345 305 1.1499 16693096
0.1839 0.4416 310 1.1526 16976408
0.2314 0.4488 315 1.1499 17252728
0.2425 0.4559 320 1.1526 17521216
0.2362 0.4630 325 1.1487 17788696
0.2139 0.4701 330 1.1502 18057744
0.2801 0.4773 335 1.1443 18332304
0.3707 0.4844 340 1.1458 18610592
0.2548 0.4915 345 1.1450 18881784
0.2455 0.4986 350 1.1418 19146128
0.2278 0.5057 355 1.1452 19420384
0.2771 0.5129 360 1.1420 19696584
0.2731 0.5200 365 1.1394 19967720
0.219 0.5271 370 1.1415 20241272
0.2432 0.5342 375 1.1457 20514896
0.1841 0.5414 380 1.1429 20779312
0.2617 0.5485 385 1.1404 21056016
0.2928 0.5556 390 1.1404 21327080
0.1952 0.5627 395 1.1354 21598992
0.227 0.5699 400 1.1381 21877208
0.2218 0.5770 405 1.1380 22149176
0.1683 0.5841 410 1.1375 22423056
0.3227 0.5912 415 1.1348 22693424
0.3058 0.5983 420 1.1357 22966920
0.1881 0.6055 425 1.1341 23246936
0.2359 0.6126 430 1.1314 23522192
0.2074 0.6197 435 1.1307 23801944
0.2584 0.6268 440 1.1328 24074328
0.2027 0.6340 445 1.1289 24348328
0.2897 0.6411 450 1.1305 24623816
0.2167 0.6482 455 1.1309 24902928
0.3028 0.6553 460 1.1306 25174984
0.2939 0.6625 465 1.1287 25447728
0.2679 0.6696 470 1.1262 25716008
0.3617 0.6767 475 1.1275 25994912
0.3261 0.6838 480 1.1266 26270048
0.2113 0.6909 485 1.1270 26541616
0.3059 0.6981 490 1.1287 26818200
0.2356 0.7052 495 1.1242 27087272
0.2931 0.7123 500 1.1246 27359208
0.2421 0.7194 505 1.1233 27638688
0.2792 0.7266 510 1.1252 27911800
0.2415 0.7337 515 1.1214 28186904
0.292 0.7408 520 1.1222 28462520
0.2697 0.7479 525 1.1214 28740360
0.2745 0.7551 530 1.1196 29013592
0.2365 0.7622 535 1.1221 29285096
0.2456 0.7693 540 1.1199 29557536
0.2182 0.7764 545 1.1208 29835096
0.3136 0.7835 550 1.1219 30112088
0.184 0.7907 555 1.1167 30387312
0.2508 0.7978 560 1.1200 30659104
0.2854 0.8049 565 1.1208 30939024
0.2423 0.8120 570 1.1186 31214856
0.3061 0.8192 575 1.1174 31487176
0.2599 0.8263 580 1.1176 31758936
0.1641 0.8334 585 1.1192 32029768
0.3293 0.8405 590 1.1180 32306824
0.1687 0.8477 595 1.1187 32583424
0.2466 0.8548 600 1.1157 32855528
0.2684 0.8619 605 1.1151 33131344
0.2623 0.8690 610 1.1156 33412888
0.3949 0.8761 615 1.1167 33688992
0.2317 0.8833 620 1.1167 33963096
0.2483 0.8904 625 1.1147 34243336
0.3731 0.8975 630 1.1142 34521472
0.2577 0.9046 635 1.1143 34794832
0.2225 0.9118 640 1.1139 35064072
0.1567 0.9189 645 1.1146 35342008
0.3207 0.9260 650 1.1146 35610720
0.1626 0.9331 655 1.1153 35880752
0.2122 0.9403 660 1.1138 36156864
0.2865 0.9474 665 1.1110 36433816
0.2319 0.9545 670 1.1134 36713952
0.1696 0.9616 675 1.1129 36980552
0.2326 0.9687 680 1.1120 37256536
0.2783 0.9759 685 1.1133 37524184
0.2046 0.9830 690 1.1113 37805352
0.2798 0.9901 695 1.1119 38079104
0.2794 0.9972 700 1.1159 38340280

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

Base model

google/gemma-2-2b
Finetuned
(448)
this model