metadata

license: gemma
base_model: google/gemma-2-9b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-9b_hs2_replace_iter2_sftsd0
    results: []

collapse_gemma-2-9b_hs2_replace_iter2_sftsd0

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.2206
Num Input Tokens Seen: 4604388

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.0956	0.0544	5	1.0730	251364
0.7073	0.1088	10	1.0601	506644
0.398	0.1632	15	1.0813	756848
0.2713	0.2176	20	1.1120	1009688
0.1577	0.2720	25	1.1339	1263852
0.1476	0.3264	30	1.1173	1518564
0.1351	0.3808	35	1.1436	1773696
0.0789	0.4352	40	1.1029	2023168
0.1184	0.4896	45	1.1221	2280352
0.1306	0.5440	50	1.1244	2528600
0.0673	0.5984	55	1.1371	2787720
0.099	0.6528	60	1.1386	3037224
0.1256	0.7072	65	1.1399	3299104
0.0842	0.7616	70	1.1874	3556764
0.12	0.8160	75	1.1876	3813380
0.0752	0.8705	80	1.1980	4059584
0.0604	0.9249	85	1.2308	4303764
0.046	0.9793	90	1.2139	4556488

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1