RylanSchaeffer's picture
End of training
0a643f9 verified
metadata
license: gemma
base_model: google/gemma-2-9b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-9b_hs2_replace_iter2_sftsd0
    results: []

collapse_gemma-2-9b_hs2_replace_iter2_sftsd0

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2206
  • Num Input Tokens Seen: 4604388

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.0956 0.0544 5 1.0730 251364
0.7073 0.1088 10 1.0601 506644
0.398 0.1632 15 1.0813 756848
0.2713 0.2176 20 1.1120 1009688
0.1577 0.2720 25 1.1339 1263852
0.1476 0.3264 30 1.1173 1518564
0.1351 0.3808 35 1.1436 1773696
0.0789 0.4352 40 1.1029 2023168
0.1184 0.4896 45 1.1221 2280352
0.1306 0.5440 50 1.1244 2528600
0.0673 0.5984 55 1.1371 2787720
0.099 0.6528 60 1.1386 3037224
0.1256 0.7072 65 1.1399 3299104
0.0842 0.7616 70 1.1874 3556764
0.12 0.8160 75 1.1876 3813380
0.0752 0.8705 80 1.1980 4059584
0.0604 0.9249 85 1.2308 4303764
0.046 0.9793 90 1.2139 4556488

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1