Edit model card

collapse_gemma-2-2b_hs2_replace_iter3_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8911
  • Num Input Tokens Seen: 8330720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.4905 0.0322 5 1.3077 270832
1.2602 0.0643 10 1.2233 538040
1.1284 0.0965 15 1.2281 810568
0.7483 0.1287 20 1.2969 1081856
0.6342 0.1608 25 1.4400 1347624
0.4624 0.1930 30 1.6103 1613304
0.3721 0.2252 35 1.7194 1880416
0.2581 0.2573 40 1.7768 2149880
0.1611 0.2895 45 1.8426 2416712
0.1031 0.3217 50 1.9013 2681168
0.1092 0.3538 55 1.9516 2946912
0.1202 0.3860 60 1.9557 3214960
0.0956 0.4182 65 1.9342 3484184
0.0726 0.4503 70 1.8705 3756200
0.0687 0.4825 75 1.8882 4021312
0.0399 0.5147 80 1.8351 4291144
0.0562 0.5468 85 1.8887 4554544
0.0621 0.5790 90 1.8666 4829952
0.0416 0.6112 95 1.7668 5092984
0.0435 0.6433 100 1.8431 5361048
0.0669 0.6755 105 1.8500 5629424
0.064 0.7077 110 1.7670 5901224
0.0491 0.7398 115 1.7368 6163240
0.0455 0.7720 120 1.8418 6432208
0.0378 0.8042 125 1.8950 6704256
0.0423 0.8363 130 1.8546 6975512
0.08 0.8685 135 1.8218 7243344
0.061 0.9007 140 1.8678 7510512
0.0408 0.9329 145 1.9605 7787288
0.0359 0.9650 150 1.9672 8053856
0.0587 0.9972 155 1.8911 8330720

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter3_sftsd1

Base model

google/gemma-2-2b
Finetuned
(406)
this model