collapse_gemma-2-9b_hs2_replace_iter2_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1938
  • Num Input Tokens Seen: 4356888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.2607 0.0578 5 1.0656 249188
0.8686 0.1157 10 1.0503 498288
0.4747 0.1735 15 1.0885 749948
0.377 0.2314 20 1.1396 1006472
0.1629 0.2892 25 1.1127 1257592
0.1892 0.3471 30 1.1207 1511064
0.1595 0.4049 35 1.1071 1770240
0.1497 0.4628 40 1.1300 2026852
0.1044 0.5206 45 1.1161 2285384
0.0925 0.5785 50 1.1269 2536240
0.113 0.6363 55 1.1373 2790304
0.064 0.6941 60 1.1432 3036560
0.0947 0.7520 65 1.1566 3289156
0.1019 0.8098 70 1.1761 3545648
0.0967 0.8677 75 1.1715 3802336
0.0658 0.9255 80 1.2216 4053176
0.051 0.9834 85 1.2049 4307136

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_replace_iter2_sftsd2

Base model

google/gemma-2-9b
Finetuned
(226)
this model