Edit model card

collapse_gemma-2-2b_hs2_replace_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1659
  • Num Input Tokens Seen: 7948024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.5632 0.0316 5 1.3061 246640
1.3817 0.0632 10 1.2228 494576
1.0035 0.0947 15 1.2471 744584
0.6544 0.1263 20 1.4073 997672
0.4776 0.1579 25 1.5377 1254560
0.3655 0.1895 30 1.6643 1501936
0.2114 0.2211 35 1.8147 1753752
0.1432 0.2527 40 2.0060 2004664
0.0971 0.2842 45 2.1422 2255696
0.0583 0.3158 50 2.1872 2503680
0.0617 0.3474 55 2.2333 2752312
0.0418 0.3790 60 2.2179 3014008
0.0354 0.4106 65 2.2580 3272640
0.0341 0.4422 70 2.3017 3531768
0.0365 0.4737 75 2.3306 3783288
0.0388 0.5053 80 2.3409 4030184
0.0293 0.5369 85 2.3008 4283032
0.0542 0.5685 90 2.2747 4542640
0.0333 0.6001 95 2.2006 4797104
0.0314 0.6317 100 2.1578 5049888
0.0504 0.6632 105 2.1483 5293872
0.0344 0.6948 110 2.1589 5538240
0.0277 0.7264 115 2.1630 5793368
0.0281 0.7580 120 2.1890 6044800
0.0289 0.7896 125 2.2083 6302168
0.0336 0.8212 130 2.2451 6546744
0.0442 0.8527 135 2.2112 6795368
0.0292 0.8843 140 2.1838 7042064
0.036 0.9159 145 2.2140 7291040
0.0295 0.9475 150 2.2179 7545048
0.0291 0.9791 155 2.1915 7794264

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter4_sftsd2

Base model

google/gemma-2-2b
Finetuned
(406)
this model