smolchess

This model is a fine-tuned version of HuggingFaceTB/SmolLM2-135M on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.8688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use grokadamw with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
num_epochs: 0.25

Training results

Training Loss	Epoch	Step	Validation Loss
1.4847	0.0025	4	1.3890
1.2333	0.0050	8	1.2242
1.2154	0.0075	12	1.1705
1.1268	0.0100	16	1.1241
1.0556	0.0125	20	1.1055
1.0629	0.0150	24	1.0848
1.1023	0.0176	28	1.0764
1.102	0.0201	32	1.0554
1.0798	0.0226	36	1.0567
0.9436	0.0251	40	1.0365
1.0524	0.0276	44	1.0275
1.1201	0.0301	48	1.0198
1.0565	0.0326	52	1.0135
0.9082	0.0351	56	1.0084
1.0544	0.0376	60	0.9970
1.0034	0.0401	64	0.9939
0.8859	0.0426	68	0.9852
1.018	0.0451	72	0.9816
0.8901	0.0476	76	0.9761
0.8943	0.0502	80	0.9723
1.0486	0.0527	84	0.9718
1.0102	0.0552	88	0.9680
0.9617	0.0577	92	0.9602
0.9879	0.0602	96	0.9607
0.9482	0.0627	100	0.9523
1.0265	0.0652	104	0.9518
0.8865	0.0677	108	0.9493
1.0046	0.0702	112	0.9448
0.9593	0.0727	116	0.9384
1.0167	0.0752	120	0.9377
0.9041	0.0777	124	0.9345
0.8702	0.0803	128	0.9311
0.9117	0.0828	132	0.9333
0.936	0.0853	136	0.9262
0.9341	0.0878	140	0.9237
0.913	0.0903	144	0.9219
0.9205	0.0928	148	0.9204
0.9081	0.0953	152	0.9183
0.8826	0.0978	156	0.9162
0.9578	0.1003	160	0.9142
0.845	0.1028	164	0.9128
0.9254	0.1053	168	0.9102
0.9622	0.1078	172	0.9096
0.7854	0.1103	176	0.9085
0.9143	0.1129	180	0.9071
0.99	0.1154	184	0.9043
0.9855	0.1179	188	0.9038
0.9745	0.1204	192	0.9017
0.9532	0.1229	196	0.8998
0.9464	0.1254	200	0.8989
0.8713	0.1279	204	0.8962
0.8501	0.1304	208	0.8942
0.9065	0.1329	212	0.8936
0.8949	0.1354	216	0.8924
0.9504	0.1379	220	0.8900
0.9059	0.1404	224	0.8900
0.909	0.1429	228	0.8881
0.9684	0.1455	232	0.8864
0.968	0.1480	236	0.8865
0.9436	0.1505	240	0.8853
0.9166	0.1530	244	0.8841
0.977	0.1555	248	0.8825
0.9011	0.1580	252	0.8820
0.8842	0.1605	256	0.8812
0.9399	0.1630	260	0.8806
0.9211	0.1655	264	0.8791
0.8043	0.1680	268	0.8785
0.8406	0.1705	272	0.8778
0.8463	0.1730	276	0.8765
0.8638	0.1755	280	0.8762
0.894	0.1781	284	0.8761
0.8925	0.1806	288	0.8753
0.9029	0.1831	292	0.8754
0.809	0.1856	296	0.8749
0.9558	0.1881	300	0.8742
0.8286	0.1906	304	0.8736
0.8714	0.1931	308	0.8730
0.8562	0.1956	312	0.8728
0.858	0.1981	316	0.8723
0.9027	0.2006	320	0.8719
0.9023	0.2031	324	0.8716
0.856	0.2056	328	0.8712
0.8455	0.2082	332	0.8709
0.8886	0.2107	336	0.8705
0.8717	0.2132	340	0.8703
0.9145	0.2157	344	0.8700
0.9618	0.2182	348	0.8698
0.9083	0.2207	352	0.8697
0.9448	0.2232	356	0.8695
0.9188	0.2257	360	0.8693
0.8006	0.2282	364	0.8692
0.8222	0.2307	368	0.8691
0.8936	0.2332	372	0.8690
0.9366	0.2357	376	0.8689
0.9336	0.2382	380	0.8689
0.6878	0.2408	384	0.8689
0.9405	0.2433	388	0.8688
0.9022	0.2458	392	0.8688
0.8499	0.2483	396	0.8688

Framework versions

Transformers 4.46.1
Pytorch 2.4.1+cu121
Datasets 3.1.0
Tokenizers 0.20.1

nlpguy
/

smolchess

smolchess

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for nlpguy/smolchess

Evaluation results