collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd0

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9492
Num Input Tokens Seen: 19618104

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.2703	0.0130	5	1.1837	260728
1.2545	0.0261	10	1.0745	511292
0.9867	0.0391	15	1.0219	760004
0.7077	0.0522	20	1.0147	1015876
0.5384	0.0652	25	1.0220	1270300
0.5591	0.0783	30	1.0193	1525228
0.4475	0.0913	35	1.0166	1784804
0.3602	0.1044	40	1.0124	2036584
0.3623	0.1174	45	1.0037	2297140
0.3845	0.1305	50	0.9974	2559416
0.2587	0.1435	55	0.9923	2810020
0.4471	0.1566	60	0.9912	3060436
0.3047	0.1696	65	0.9868	3321640
0.3731	0.1827	70	0.9832	3573720
0.3265	0.1957	75	0.9839	3828028
0.2885	0.2088	80	0.9812	4080608
0.3128	0.2218	85	0.9791	4336288
0.3204	0.2349	90	0.9770	4590108
0.3495	0.2479	95	0.9758	4853076
0.2884	0.2610	100	0.9760	5107028
0.3117	0.2740	105	0.9728	5361252
0.3231	0.2871	110	0.9732	5615724
0.3288	0.3001	115	0.9715	5871856
0.3798	0.3132	120	0.9698	6127844
0.2902	0.3262	125	0.9698	6385324
0.3605	0.3393	130	0.9706	6633264
0.3544	0.3523	135	0.9679	6886668
0.34	0.3654	140	0.9670	7149304
0.3764	0.3784	145	0.9674	7405164
0.2529	0.3915	150	0.9675	7653688
0.2816	0.4045	155	0.9672	7913220
0.2044	0.4176	160	0.9648	8167932
0.2825	0.4306	165	0.9658	8418852
0.2702	0.4436	170	0.9650	8677864
0.3071	0.4567	175	0.9650	8935764
0.3253	0.4697	180	0.9642	9187056
0.2927	0.4828	185	0.9626	9442708
0.2876	0.4958	190	0.9634	9701192
0.3425	0.5089	195	0.9624	9955308
0.3433	0.5219	200	0.9602	10214732
0.3315	0.5350	205	0.9611	10466412
0.2934	0.5480	210	0.9605	10714628
0.2463	0.5611	215	0.9612	10976808
0.3642	0.5741	220	0.9613	11234876
0.3245	0.5872	225	0.9589	11495408
0.2885	0.6002	230	0.9589	11752512
0.3555	0.6133	235	0.9600	12002952
0.2814	0.6263	240	0.9583	12260908
0.3228	0.6394	245	0.9574	12519812
0.3228	0.6524	250	0.9576	12782436
0.3823	0.6655	255	0.9572	13042344
0.3539	0.6785	260	0.9562	13307776
0.3418	0.6916	265	0.9571	13567712
0.2592	0.7046	270	0.9593	13823848
0.2523	0.7177	275	0.9564	14073252
0.2883	0.7307	280	0.9557	14325632
0.2877	0.7438	285	0.9546	14580592
0.3691	0.7568	290	0.9545	14834352
0.2924	0.7699	295	0.9546	15098672
0.3078	0.7829	300	0.9533	15350204
0.3201	0.7960	305	0.9544	15609792
0.3147	0.8090	310	0.9544	15869296
0.3097	0.8221	315	0.9523	16121416
0.2708	0.8351	320	0.9522	16378908
0.2285	0.8481	325	0.9549	16637160
0.2825	0.8612	330	0.9535	16895604
0.3189	0.8742	335	0.9523	17153840
0.263	0.8873	340	0.9529	17408728
0.247	0.9003	345	0.9521	17664248
0.2309	0.9134	350	0.9532	17925640
0.2487	0.9264	355	0.9513	18183340
0.3177	0.9395	360	0.9518	18443996
0.2997	0.9525	365	0.9521	18692904
0.3384	0.9656	370	0.9516	18947432
0.2958	0.9786	375	0.9513	19210912
0.3001	0.9917	380	0.9484	19465112

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd0

collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd0

Evaluation results