jkazdan's picture
End of training
55dedf9 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter6_sftsd0
    results: []

collapse_gemma-2-2b_hs2_replace_iter6_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4135
  • Num Input Tokens Seen: 7971776

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6219 0.0316 5 1.3079 253136
1.2932 0.0631 10 1.2378 505728
0.797 0.0947 15 1.2940 753248
0.6825 0.1263 20 1.4640 1008520
0.4083 0.1579 25 1.6136 1265712
0.2934 0.1894 30 1.7972 1522896
0.1426 0.2210 35 1.9343 1771056
0.0768 0.2526 40 2.0985 2021720
0.0598 0.2841 45 2.2231 2266568
0.0343 0.3157 50 2.2738 2525864
0.035 0.3473 55 2.3380 2773832
0.0341 0.3788 60 2.3578 3025992
0.0324 0.4104 65 2.3326 3282432
0.0339 0.4420 70 2.3815 3531744
0.0309 0.4736 75 2.4070 3780960
0.0319 0.5051 80 2.3871 4036832
0.03 0.5367 85 2.3862 4292040
0.0303 0.5683 90 2.3838 4532720
0.0295 0.5998 95 2.3943 4785512
0.0325 0.6314 100 2.3693 5041576
0.0321 0.6630 105 2.3452 5296640
0.0291 0.6946 110 2.3231 5545576
0.0271 0.7261 115 2.3197 5803840
0.025 0.7577 120 2.3552 6061792
0.0245 0.7893 125 2.3695 6314664
0.0595 0.8208 130 2.3968 6573472
0.0259 0.8524 135 2.4351 6821240
0.0262 0.8840 140 2.4190 7072472
0.0264 0.9155 145 2.4247 7323632
0.029 0.9471 150 2.4290 7572360
0.0282 0.9787 155 2.4186 7816368

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1