collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd0
This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9492
- Num Input Tokens Seen: 19618104
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.2335 | 0 |
1.2703 | 0.0130 | 5 | 1.1837 | 260728 |
1.2545 | 0.0261 | 10 | 1.0745 | 511292 |
0.9867 | 0.0391 | 15 | 1.0219 | 760004 |
0.7077 | 0.0522 | 20 | 1.0147 | 1015876 |
0.5384 | 0.0652 | 25 | 1.0220 | 1270300 |
0.5591 | 0.0783 | 30 | 1.0193 | 1525228 |
0.4475 | 0.0913 | 35 | 1.0166 | 1784804 |
0.3602 | 0.1044 | 40 | 1.0124 | 2036584 |
0.3623 | 0.1174 | 45 | 1.0037 | 2297140 |
0.3845 | 0.1305 | 50 | 0.9974 | 2559416 |
0.2587 | 0.1435 | 55 | 0.9923 | 2810020 |
0.4471 | 0.1566 | 60 | 0.9912 | 3060436 |
0.3047 | 0.1696 | 65 | 0.9868 | 3321640 |
0.3731 | 0.1827 | 70 | 0.9832 | 3573720 |
0.3265 | 0.1957 | 75 | 0.9839 | 3828028 |
0.2885 | 0.2088 | 80 | 0.9812 | 4080608 |
0.3128 | 0.2218 | 85 | 0.9791 | 4336288 |
0.3204 | 0.2349 | 90 | 0.9770 | 4590108 |
0.3495 | 0.2479 | 95 | 0.9758 | 4853076 |
0.2884 | 0.2610 | 100 | 0.9760 | 5107028 |
0.3117 | 0.2740 | 105 | 0.9728 | 5361252 |
0.3231 | 0.2871 | 110 | 0.9732 | 5615724 |
0.3288 | 0.3001 | 115 | 0.9715 | 5871856 |
0.3798 | 0.3132 | 120 | 0.9698 | 6127844 |
0.2902 | 0.3262 | 125 | 0.9698 | 6385324 |
0.3605 | 0.3393 | 130 | 0.9706 | 6633264 |
0.3544 | 0.3523 | 135 | 0.9679 | 6886668 |
0.34 | 0.3654 | 140 | 0.9670 | 7149304 |
0.3764 | 0.3784 | 145 | 0.9674 | 7405164 |
0.2529 | 0.3915 | 150 | 0.9675 | 7653688 |
0.2816 | 0.4045 | 155 | 0.9672 | 7913220 |
0.2044 | 0.4176 | 160 | 0.9648 | 8167932 |
0.2825 | 0.4306 | 165 | 0.9658 | 8418852 |
0.2702 | 0.4436 | 170 | 0.9650 | 8677864 |
0.3071 | 0.4567 | 175 | 0.9650 | 8935764 |
0.3253 | 0.4697 | 180 | 0.9642 | 9187056 |
0.2927 | 0.4828 | 185 | 0.9626 | 9442708 |
0.2876 | 0.4958 | 190 | 0.9634 | 9701192 |
0.3425 | 0.5089 | 195 | 0.9624 | 9955308 |
0.3433 | 0.5219 | 200 | 0.9602 | 10214732 |
0.3315 | 0.5350 | 205 | 0.9611 | 10466412 |
0.2934 | 0.5480 | 210 | 0.9605 | 10714628 |
0.2463 | 0.5611 | 215 | 0.9612 | 10976808 |
0.3642 | 0.5741 | 220 | 0.9613 | 11234876 |
0.3245 | 0.5872 | 225 | 0.9589 | 11495408 |
0.2885 | 0.6002 | 230 | 0.9589 | 11752512 |
0.3555 | 0.6133 | 235 | 0.9600 | 12002952 |
0.2814 | 0.6263 | 240 | 0.9583 | 12260908 |
0.3228 | 0.6394 | 245 | 0.9574 | 12519812 |
0.3228 | 0.6524 | 250 | 0.9576 | 12782436 |
0.3823 | 0.6655 | 255 | 0.9572 | 13042344 |
0.3539 | 0.6785 | 260 | 0.9562 | 13307776 |
0.3418 | 0.6916 | 265 | 0.9571 | 13567712 |
0.2592 | 0.7046 | 270 | 0.9593 | 13823848 |
0.2523 | 0.7177 | 275 | 0.9564 | 14073252 |
0.2883 | 0.7307 | 280 | 0.9557 | 14325632 |
0.2877 | 0.7438 | 285 | 0.9546 | 14580592 |
0.3691 | 0.7568 | 290 | 0.9545 | 14834352 |
0.2924 | 0.7699 | 295 | 0.9546 | 15098672 |
0.3078 | 0.7829 | 300 | 0.9533 | 15350204 |
0.3201 | 0.7960 | 305 | 0.9544 | 15609792 |
0.3147 | 0.8090 | 310 | 0.9544 | 15869296 |
0.3097 | 0.8221 | 315 | 0.9523 | 16121416 |
0.2708 | 0.8351 | 320 | 0.9522 | 16378908 |
0.2285 | 0.8481 | 325 | 0.9549 | 16637160 |
0.2825 | 0.8612 | 330 | 0.9535 | 16895604 |
0.3189 | 0.8742 | 335 | 0.9523 | 17153840 |
0.263 | 0.8873 | 340 | 0.9529 | 17408728 |
0.247 | 0.9003 | 345 | 0.9521 | 17664248 |
0.2309 | 0.9134 | 350 | 0.9532 | 17925640 |
0.2487 | 0.9264 | 355 | 0.9513 | 18183340 |
0.3177 | 0.9395 | 360 | 0.9518 | 18443996 |
0.2997 | 0.9525 | 365 | 0.9521 | 18692904 |
0.3384 | 0.9656 | 370 | 0.9516 | 18947432 |
0.2958 | 0.9786 | 375 | 0.9513 | 19210912 |
0.3001 | 0.9917 | 380 | 0.9484 | 19465112 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 1
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd0
Base model
google/gemma-2-9b