strict_balanced_cf_seed-21_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 21
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
5.9825	0.9998	1486	4.4171	0.2926
4.3054	1.9997	2972	3.9050	0.3329
3.6755	2.9997	4458	3.6276	0.3573
3.4878	3.9996	5944	3.4715	0.3714
3.2604	4.9995	7430	3.3707	0.3811
3.1894	5.9994	8916	3.3120	0.3864
3.0822	6.9993	10402	3.2720	0.3903
3.0424	7.9999	11889	3.2489	0.3915
2.9835	8.9998	13375	3.2300	0.3943
2.9586	9.9997	14861	3.2202	0.3957
2.9205	10.9997	16347	3.2069	0.3972
2.901	11.9996	17833	3.2097	0.3975
2.8789	12.9995	19319	3.1967	0.3987
2.8594	13.9994	20805	3.1981	0.3986
2.8502	14.9993	22291	3.1954	0.3996
2.8349	15.9999	23778	3.1954	0.3996
2.8319	16.9998	25264	3.1878	0.4001
2.8127	17.9997	26750	3.1866	0.4005
2.8195	18.9997	28236	3.1900	0.4002
2.7995	19.9982	29720	3.1894	0.4008