collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd0

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9492
  • Num Input Tokens Seen: 19618104

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.2703 0.0130 5 1.1837 260728
1.2545 0.0261 10 1.0745 511292
0.9867 0.0391 15 1.0219 760004
0.7077 0.0522 20 1.0147 1015876
0.5384 0.0652 25 1.0220 1270300
0.5591 0.0783 30 1.0193 1525228
0.4475 0.0913 35 1.0166 1784804
0.3602 0.1044 40 1.0124 2036584
0.3623 0.1174 45 1.0037 2297140
0.3845 0.1305 50 0.9974 2559416
0.2587 0.1435 55 0.9923 2810020
0.4471 0.1566 60 0.9912 3060436
0.3047 0.1696 65 0.9868 3321640
0.3731 0.1827 70 0.9832 3573720
0.3265 0.1957 75 0.9839 3828028
0.2885 0.2088 80 0.9812 4080608
0.3128 0.2218 85 0.9791 4336288
0.3204 0.2349 90 0.9770 4590108
0.3495 0.2479 95 0.9758 4853076
0.2884 0.2610 100 0.9760 5107028
0.3117 0.2740 105 0.9728 5361252
0.3231 0.2871 110 0.9732 5615724
0.3288 0.3001 115 0.9715 5871856
0.3798 0.3132 120 0.9698 6127844
0.2902 0.3262 125 0.9698 6385324
0.3605 0.3393 130 0.9706 6633264
0.3544 0.3523 135 0.9679 6886668
0.34 0.3654 140 0.9670 7149304
0.3764 0.3784 145 0.9674 7405164
0.2529 0.3915 150 0.9675 7653688
0.2816 0.4045 155 0.9672 7913220
0.2044 0.4176 160 0.9648 8167932
0.2825 0.4306 165 0.9658 8418852
0.2702 0.4436 170 0.9650 8677864
0.3071 0.4567 175 0.9650 8935764
0.3253 0.4697 180 0.9642 9187056
0.2927 0.4828 185 0.9626 9442708
0.2876 0.4958 190 0.9634 9701192
0.3425 0.5089 195 0.9624 9955308
0.3433 0.5219 200 0.9602 10214732
0.3315 0.5350 205 0.9611 10466412
0.2934 0.5480 210 0.9605 10714628
0.2463 0.5611 215 0.9612 10976808
0.3642 0.5741 220 0.9613 11234876
0.3245 0.5872 225 0.9589 11495408
0.2885 0.6002 230 0.9589 11752512
0.3555 0.6133 235 0.9600 12002952
0.2814 0.6263 240 0.9583 12260908
0.3228 0.6394 245 0.9574 12519812
0.3228 0.6524 250 0.9576 12782436
0.3823 0.6655 255 0.9572 13042344
0.3539 0.6785 260 0.9562 13307776
0.3418 0.6916 265 0.9571 13567712
0.2592 0.7046 270 0.9593 13823848
0.2523 0.7177 275 0.9564 14073252
0.2883 0.7307 280 0.9557 14325632
0.2877 0.7438 285 0.9546 14580592
0.3691 0.7568 290 0.9545 14834352
0.2924 0.7699 295 0.9546 15098672
0.3078 0.7829 300 0.9533 15350204
0.3201 0.7960 305 0.9544 15609792
0.3147 0.8090 310 0.9544 15869296
0.3097 0.8221 315 0.9523 16121416
0.2708 0.8351 320 0.9522 16378908
0.2285 0.8481 325 0.9549 16637160
0.2825 0.8612 330 0.9535 16895604
0.3189 0.8742 335 0.9523 17153840
0.263 0.8873 340 0.9529 17408728
0.247 0.9003 345 0.9521 17664248
0.2309 0.9134 350 0.9532 17925640
0.2487 0.9264 355 0.9513 18183340
0.3177 0.9395 360 0.9518 18443996
0.2997 0.9525 365 0.9521 18692904
0.3384 0.9656 370 0.9516 18947432
0.2958 0.9786 375 0.9513 19210912
0.3001 0.9917 380 0.9484 19465112

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd0

Base model

google/gemma-2-9b
Finetuned
(226)
this model