Edit model card

augmented_step_val_25_gemma-2-2b_hs2_iter1_sftsd2

This model is a fine-tuned version of jkazdan/step_val_25_gemma-2-2b_hs2_iter1_sftsd2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5025
  • Num Input Tokens Seen: 7865232

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.0950 0
1.4558 0.0345 5 1.0942 274624
1.2848 0.0690 10 1.1065 546200
1.0788 0.1035 15 1.1339 817584
0.9149 0.1380 20 1.1915 1088176
0.8855 0.1725 25 1.2506 1358336
0.8151 0.2070 30 1.3419 1637992
0.5913 0.2415 35 1.3767 1911376
0.5641 0.2760 40 1.4619 2181176
0.5135 0.3105 45 1.4701 2462856
0.335 0.3450 50 1.4866 2737752
0.332 0.3795 55 1.5121 3012656
0.3655 0.4140 60 1.4798 3279744
0.249 0.4485 65 1.4564 3547808
0.2495 0.4830 70 1.4986 3817328
0.2821 0.5175 75 1.4208 4097184
0.1291 0.5520 80 1.4710 4367848
0.2026 0.5865 85 1.4296 4640592
0.2365 0.6210 90 1.5041 4922032
0.1523 0.6555 95 1.4437 5193088
0.1677 0.6900 100 1.4660 5464864
0.2233 0.7245 105 1.4473 5739032
0.1273 0.7589 110 1.4308 6012736
0.1756 0.7934 115 1.4913 6274808
0.1822 0.8279 120 1.4676 6548312
0.1255 0.8624 125 1.4698 6821112
0.1072 0.8969 130 1.4484 7098736
0.1329 0.9314 135 1.4401 7369552
0.104 0.9659 140 1.4771 7640632

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/augmented_step_val_25_gemma-2-2b_hs2_iter1_sftsd2

Base model

google/gemma-2-2b
Finetuned
(3)
this model