fresh-4-layer-swag-distill-of-fresh-4-layer-gpqa

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	125	13.8015	0.2778
No log	2.0	250	14.0268	0.3535
No log	3.0	375	13.0123	0.3838
1.8616	4.0	500	12.3288	0.3535
1.8616	5.0	625	12.1718	0.3737
1.8616	6.0	750	12.7654	0.3889
1.8616	7.0	875	12.6711	0.3838
0.4769	8.0	1000	12.0719	0.4141
0.4769	9.0	1125	11.8960	0.4091
0.4769	10.0	1250	12.0726	0.4192
0.4769	11.0	1375	11.8632	0.4293
0.1853	12.0	1500	11.6135	0.4141
0.1853	13.0	1625	12.2307	0.4141
0.1853	14.0	1750	11.7646	0.4040
0.1853	15.0	1875	11.6897	0.4141
0.0913	16.0	2000	12.0394	0.4091
0.0913	17.0	2125	11.7915	0.4040
0.0913	18.0	2250	12.0047	0.3990
0.0913	19.0	2375	11.9798	0.3939
0.0436	20.0	2500	12.0208	0.4040