Edit model card

collapse_gemma-2-2b_hs2_replace_iter18_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6236
  • Num Input Tokens Seen: 4624112

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5938 0.0511 5 1.2801 239496
0.8257 0.1021 10 1.3276 475896
0.3781 0.1532 15 1.5795 715416
0.203 0.2042 20 1.7945 958344
0.0828 0.2553 25 2.1123 1201808
0.0475 0.3063 30 2.2986 1441936
0.0255 0.3574 35 2.4401 1685128
0.0238 0.4084 40 2.5437 1923208
0.0205 0.4595 45 2.6059 2159992
0.0217 0.5105 50 2.6290 2396440
0.0236 0.5616 55 2.6241 2630120
0.0209 0.6126 60 2.6176 2871120
0.0202 0.6637 65 2.6088 3102520
0.0202 0.7147 70 2.6099 3337176
0.0194 0.7658 75 2.6224 3580072
0.022 0.8168 80 2.6128 3811176
0.0201 0.8679 85 2.6123 4053312
0.022 0.9190 90 2.6136 4283408
0.0201 0.9700 95 2.6248 4529888

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter18_sftsd2

Base model

google/gemma-2-2b
Finetuned
(446)
this model