Baby-Llama-58M-RUN3_4

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
296.6382	1.0	12	255.9934
229.3247	2.0	24	211.9065
207.0496	3.0	36	181.8885
123.9346	4.0	48	109.3748
82.0349	5.0	60	72.1227
45.9392	6.0	72	39.7369
25.2634	7.0	84	22.4471
15.2842	8.0	96	13.8333
10.3515	9.0	108	10.2077
8.1678	10.0	120	7.8930
6.461	11.0	132	6.9546
6.073	12.0	144	6.3275
5.4812	13.0	156	5.9462
5.5237	14.0	168	5.6727
4.727	15.0	180	5.5723
4.6544	16.0	192	5.2316
4.641	17.0	204	5.2542
4.5579	18.0	216	5.1794
4.6136	19.0	228	4.9774
4.1043	20.0	240	4.9214
4.1177	21.0	252	4.8358
4.6799	22.0	264	4.7847
4.0522	23.0	276	4.7018
4.2287	24.0	288	4.6770
3.9668	25.0	300	4.6077
4.1524	26.0	312	4.6043
3.8744	27.0	324	4.5508
3.9389	28.0	336	4.4908
3.9329	29.0	348	4.4882
3.9034	30.0	360	4.4708
3.9221	31.0	372	4.4729
3.8269	32.0	384	4.3710
3.8344	33.0	396	4.3734
3.3988	34.0	408	4.2938
3.4335	35.0	420	4.3189
3.521	36.0	432	4.2749
3.5696	37.0	444	4.2773
3.6298	38.0	456	4.2541
3.6759	39.0	468	4.2371
3.6787	40.0	480	4.2151
3.3474	41.0	492	4.1932
3.5124	42.0	504	4.1978
3.1906	43.0	516	4.1859
3.4355	44.0	528	4.1770
3.3138	45.0	540	4.1743
3.6061	46.0	552	4.1742
3.8685	47.0	564	4.1653
3.4448	48.0	576	4.1635
3.5253	49.0	588	4.1623
3.6948	50.0	600	4.1614