jkazdan's picture
End of training
ebed762 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter7_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter7_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5277
  • Num Input Tokens Seen: 7940992

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.5364 0.0316 5 1.3068 253784
1.1259 0.0632 10 1.2465 510904
0.6887 0.0947 15 1.3550 769904
0.5128 0.1263 20 1.5270 1016400
0.3194 0.1579 25 1.6846 1268560
0.1918 0.1895 30 1.8390 1517032
0.1256 0.2211 35 2.0239 1770608
0.0896 0.2527 40 2.2047 2023392
0.0555 0.2842 45 2.3706 2273832
0.0402 0.3158 50 2.4122 2529912
0.0338 0.3474 55 2.4354 2783520
0.0345 0.3790 60 2.3862 3030728
0.0281 0.4106 65 2.4360 3277464
0.0285 0.4422 70 2.4702 3520992
0.0296 0.4737 75 2.4709 3770984
0.0284 0.5053 80 2.5290 4026776
0.0254 0.5369 85 2.5619 4275136
0.0288 0.5685 90 2.5185 4524936
0.026 0.6001 95 2.4887 4781848
0.0265 0.6317 100 2.4976 5021704
0.0254 0.6632 105 2.4820 5274368
0.0322 0.6948 110 2.4803 5529272
0.0308 0.7264 115 2.4894 5783496
0.0263 0.7580 120 2.5027 6034168
0.0254 0.7896 125 2.4805 6291808
0.0263 0.8212 130 2.4729 6544088
0.0335 0.8527 135 2.4893 6791616
0.0264 0.8843 140 2.5056 7045456
0.0251 0.9159 145 2.5100 7294288
0.0245 0.9475 150 2.5167 7548384
0.0277 0.9791 155 2.5252 7791720

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1