Baby-Llama-58M-ORIGINAL

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 4.1715

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.00025
train_batch_size: 32
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
num_epochs: 80
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
289.4006	1.0	12	243.1602
229.6677	2.0	24	201.2611
207.9305	3.0	36	172.3865
125.9577	4.0	48	107.6148
84.8722	5.0	60	72.6301
48.2019	6.0	72	40.8081
25.796	7.0	84	22.0368
15.6686	8.0	96	13.6649
9.8745	9.0	108	9.6135
7.9539	10.0	120	8.0438
6.4873	11.0	132	7.1011
6.0896	12.0	144	6.4537
5.4521	13.0	156	6.0605
5.5516	14.0	168	6.0324
4.7538	15.0	180	5.7866
4.8229	16.0	192	5.5738
4.568	17.0	204	5.5282
4.4449	18.0	216	5.4060
4.6567	19.0	228	5.3382
4.1888	20.0	240	5.2407
4.2102	21.0	252	5.2085
4.6584	22.0	264	5.0947
4.102	23.0	276	4.9988
4.3574	24.0	288	4.9768
4.0571	25.0	300	4.9552
4.22	26.0	312	4.9127
3.9908	27.0	324	4.9050
4.0273	28.0	336	4.7905
4.0092	29.0	348	4.8265
3.9705	30.0	360	4.7823
4.0081	31.0	372	4.7383
3.8771	32.0	384	4.6774
3.899	33.0	396	4.6629
3.4711	34.0	408	4.6603
3.4489	35.0	420	4.5675
3.5063	36.0	432	4.5751
3.6348	37.0	444	4.5786
3.6931	38.0	456	4.5513
3.7022	39.0	468	4.5208
3.6842	40.0	480	4.5146
3.4084	41.0	492	4.5171
3.5141	42.0	504	4.4681
3.2337	43.0	516	4.4700
3.4376	44.0	528	4.4472
3.2911	45.0	540	4.4462
3.6011	46.0	552	4.4115
3.8547	47.0	564	4.3901
3.3866	48.0	576	4.3873
3.4543	49.0	588	4.3904
3.6357	50.0	600	4.3693
3.5045	51.0	612	4.3569
3.0792	52.0	624	4.3263
3.2731	53.0	636	4.3322
3.4193	54.0	648	4.3012
3.1097	55.0	660	4.3015
3.088	56.0	672	4.2914
2.9444	57.0	684	4.2750
3.362	58.0	696	4.2612
3.2228	59.0	708	4.2647
2.9892	60.0	720	4.2417
3.0214	61.0	732	4.2287
3.3049	62.0	744	4.2328
3.4639	63.0	756	4.2200
3.2505	64.0	768	4.2130
3.0121	65.0	780	4.2087
3.3112	66.0	792	4.2001
3.3258	67.0	804	4.2013
2.9143	68.0	816	4.1952
3.1404	69.0	828	4.1876
3.495	70.0	840	4.1910
3.134	71.0	852	4.1841
3.1945	72.0	864	4.1835
3.1116	73.0	876	4.1742
3.2141	74.0	888	4.1743
3.3962	75.0	900	4.1734
2.8472	76.0	912	4.1721
3.2455	77.0	924	4.1717
2.9047	78.0	936	4.1724
3.4182	79.0	948	4.1716
3.5737	80.0	960	4.1715

Framework versions

Transformers 4.39.1
Pytorch 2.1.2+cu121
Datasets 2.16.1
Tokenizers 0.15.0

ninagroot
/

Baby-Llama-58M-RUN3_2

Baby-Llama-58M-ORIGINAL

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results