hbertv1-massive-logit_KD-tiny_ffn_0.5

This model is a fine-tuned version of gokuls/model_v1_complete_training_wt_init_48_tiny_freeze_new_ffn_0.5 on the massive dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.3023	1.0	180	3.8354	0.1766
3.7037	2.0	360	3.2686	0.2027
3.2011	3.0	540	2.8012	0.2966
2.774	4.0	720	2.4055	0.3802
2.4069	5.0	900	2.0833	0.4747
2.1164	6.0	1080	1.8300	0.5588
1.8907	7.0	1260	1.6351	0.6252
1.71	8.0	1440	1.4792	0.6621
1.5648	9.0	1620	1.3605	0.6936
1.4399	10.0	1800	1.2607	0.7103
1.3436	11.0	1980	1.1872	0.7201
1.266	12.0	2160	1.1295	0.7285
1.1934	13.0	2340	1.0829	0.7359
1.1413	14.0	2520	1.0428	0.7472
1.0807	15.0	2700	0.9984	0.7585
1.0382	16.0	2880	0.9693	0.7600
0.9982	17.0	3060	0.9439	0.7673
0.9626	18.0	3240	0.9207	0.7723
0.9299	19.0	3420	0.8887	0.7796
0.8828	20.0	3600	0.8686	0.7796
0.8593	21.0	3780	0.8537	0.7905
0.8329	22.0	3960	0.8250	0.7934
0.8043	23.0	4140	0.8098	0.7959
0.7764	24.0	4320	0.7990	0.8008
0.7569	25.0	4500	0.7823	0.8067
0.7372	26.0	4680	0.7749	0.8023
0.7182	27.0	4860	0.7640	0.8101
0.6987	28.0	5040	0.7509	0.8106
0.6842	29.0	5220	0.7386	0.8146
0.6673	30.0	5400	0.7305	0.8146
0.6509	31.0	5580	0.7196	0.8214
0.6382	32.0	5760	0.7120	0.8170
0.6301	33.0	5940	0.7134	0.8190
0.6139	34.0	6120	0.7062	0.8200
0.6076	35.0	6300	0.6928	0.8205
0.5919	36.0	6480	0.6838	0.8244
0.5792	37.0	6660	0.6819	0.8264
0.5739	38.0	6840	0.6780	0.8210
0.5698	39.0	7020	0.6684	0.8283
0.5602	40.0	7200	0.6692	0.8249
0.5534	41.0	7380	0.6644	0.8298
0.5429	42.0	7560	0.6599	0.8278
0.5423	43.0	7740	0.6585	0.8308
0.5356	44.0	7920	0.6569	0.8293
0.5374	45.0	8100	0.6565	0.8293
0.5327	46.0	8280	0.6540	0.8273
0.5324	47.0	8460	0.6523	0.8273
0.5281	48.0	8640	0.6519	0.8283