Edit model card

collapse_gemma-2-2b_hs2_replace_iter20_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5723
  • Num Input Tokens Seen: 4374176

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5743 0.0511 5 1.2784 215504
0.907 0.1022 10 1.2866 439552
0.5039 0.1533 15 1.4995 663280
0.228 0.2043 20 1.6910 894464
0.1115 0.2554 25 1.9651 1120880
0.0589 0.3065 30 2.2089 1345936
0.0504 0.3576 35 2.3215 1567848
0.037 0.4087 40 2.4352 1798736
0.0251 0.4598 45 2.5229 2019656
0.0261 0.5109 50 2.5609 2247440
0.0278 0.5619 55 2.5577 2477584
0.0255 0.6130 60 2.5752 2703792
0.0246 0.6641 65 2.5688 2924296
0.0246 0.7152 70 2.5304 3152056
0.0224 0.7663 75 2.5154 3378048
0.023 0.8174 80 2.5120 3605880
0.0226 0.8685 85 2.5300 3834792
0.0227 0.9195 90 2.5492 4065384
0.0246 0.9706 95 2.5676 4281080

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter20_sftsd1

Base model

google/gemma-2-2b
Finetuned
(446)
this model