collapse_gemma-2-2b_hs2_replace_iter11_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4175
  • Num Input Tokens Seen: 4749720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4122 0.0511 5 1.2762 249288
0.9626 0.1021 10 1.2553 489456
0.555 0.1532 15 1.4785 734880
0.3476 0.2042 20 1.6471 984976
0.1933 0.2553 25 1.8406 1226448
0.0894 0.3063 30 2.1521 1467872
0.0481 0.3574 35 2.2638 1715464
0.0328 0.4084 40 2.3924 1964088
0.0296 0.4595 45 2.3892 2208064
0.0264 0.5105 50 2.4026 2451344
0.0291 0.5616 55 2.4242 2703160
0.0344 0.6126 60 2.4386 2944568
0.0232 0.6637 65 2.3779 3188672
0.0437 0.7147 70 2.3490 3440136
0.025 0.7658 75 2.3502 3684688
0.0234 0.8168 80 2.3684 3932336
0.0239 0.8679 85 2.3859 4179440
0.0225 0.9190 90 2.3973 4417384
0.0229 0.9700 95 2.4151 4651968

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter11_sftsd1

Base model

google/gemma-2-2b
Finetuned
(490)
this model