Edit model card

collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0982
  • Num Input Tokens Seen: 13719864

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.5182 0.0206 5 1.3566 278016
1.3436 0.0412 10 1.2405 555760
1.3099 0.0618 15 1.1728 837248
1.2444 0.0824 20 1.1443 1121888
1.1566 0.1030 25 1.1190 1405760
1.1179 0.1236 30 1.1236 1685480
1.0755 0.1441 35 1.1197 1969520
1.0909 0.1647 40 1.1289 2256056
1.0004 0.1853 45 1.1258 2535072
0.9337 0.2059 50 1.1361 2820504
0.9769 0.2265 55 1.1384 3097544
0.9309 0.2471 60 1.1453 3381016
0.8221 0.2677 65 1.1451 3662552
0.8448 0.2883 70 1.1362 3944008
0.8068 0.3089 75 1.1422 4228616
0.7794 0.3295 80 1.1449 4518704
0.839 0.3501 85 1.1377 4803488
0.7914 0.3707 90 1.1424 5092912
0.7824 0.3912 95 1.1396 5376328
0.7763 0.4118 100 1.1373 5657216
0.7058 0.4324 105 1.1450 5936696
0.7919 0.4530 110 1.1338 6218640
0.6291 0.4736 115 1.1381 6500728
0.6368 0.4942 120 1.1359 6781720
0.6676 0.5148 125 1.1343 7069904
0.6567 0.5354 130 1.1299 7351616
0.7838 0.5560 135 1.1330 7641760
0.6401 0.5766 140 1.1291 7931072
0.6275 0.5972 145 1.1238 8217432
0.6238 0.6178 150 1.1258 8498184
0.639 0.6384 155 1.1231 8779760
0.6416 0.6589 160 1.1231 9062392
0.6282 0.6795 165 1.1192 9342232
0.5363 0.7001 170 1.1197 9620560
0.6333 0.7207 175 1.1168 9904800
0.5421 0.7413 180 1.1152 10188928
0.5879 0.7619 185 1.1131 10471944
0.5608 0.7825 190 1.1117 10758568
0.4817 0.8031 195 1.1109 11046576
0.5578 0.8237 200 1.1081 11328352
0.5967 0.8443 205 1.1053 11609888
0.6086 0.8649 210 1.1074 11894256
0.6493 0.8855 215 1.1021 12180976
0.5754 0.9060 220 1.1066 12462336
0.5951 0.9266 225 1.1012 12744360
0.699 0.9472 230 1.1005 13035416
0.5918 0.9678 235 1.1012 13324984
0.6331 0.9884 240 1.0977 13606712

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd0

Base model

google/gemma-2-2b
Finetuned
(406)
this model