Edit model card

collapse_gemma-2-2b_hs2_replace_iter20_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6544
  • Num Input Tokens Seen: 4428696

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4671 0.0511 5 1.2830 227584
0.7682 0.1021 10 1.3632 451744
0.5053 0.1532 15 1.5394 675368
0.2533 0.2042 20 1.7922 904784
0.0957 0.2553 25 2.1027 1123536
0.0657 0.3063 30 2.3143 1353496
0.0474 0.3574 35 2.4308 1584648
0.0358 0.4084 40 2.5634 1807216
0.0257 0.4595 45 2.5958 2031464
0.0262 0.5105 50 2.6243 2259896
0.0245 0.5616 55 2.6355 2487688
0.0254 0.6126 60 2.6426 2718592
0.0236 0.6637 65 2.6416 2951632
0.0223 0.7147 70 2.6468 3170544
0.0222 0.7658 75 2.6497 3407056
0.023 0.8168 80 2.6480 3630328
0.0235 0.8679 85 2.6395 3869752
0.0237 0.9190 90 2.6398 4104432
0.0227 0.9700 95 2.6538 4332320

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter20_sftsd0

Base model

google/gemma-2-2b
Finetuned
(446)
this model