gpt2_cfg_add_8

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	0	0	2.7379	0.0
1.9725	0.0320	100	1.9286	0.0
1.0948	0.0641	200	0.9757	0.02
0.6137	0.0961	300	0.6562	0.09
0.3684	0.1281	400	0.3644	0.35
0.2853	0.1602	500	0.2482	0.61
0.0578	0.1922	600	0.0728	0.84
0.0081	0.2242	700	0.0669	0.88
0.0033	0.2562	800	0.0264	0.93
2.4737	0.2883	900	1.5848	0.005
0.0482	0.3203	1000	0.0470	0.89
0.0009	0.3523	1100	0.0078	0.985
0.0125	0.3844	1200	0.0068	0.98
0.005	0.4164	1300	0.0116	0.975
0.0256	0.4484	1400	0.0035	0.995
0.0003	0.4805	1500	0.0005	1.0
0.0001	0.5125	1600	0.0001	1.0
0.0	0.5445	1700	0.0000	1.0
0.0	0.5766	1800	0.0000	1.0
0.0001	0.6086	1900	0.0002	1.0
0.0	0.6406	2000	0.0000	1.0
0.0	0.6726	2100	0.0000	1.0
0.0	0.7047	2200	0.0000	1.0
0.0	0.7367	2300	0.0000	1.0
0.0	0.7687	2400	0.0000	1.0
0.0	0.8008	2500	0.0000	1.0
0.0	0.8328	2600	0.0000	1.0
0.0	0.8648	2700	0.0000	1.0
0.0	0.8969	2800	0.0000	1.0
0.0	0.9289	2900	0.0000	1.0
0.0	0.9609	3000	0.0000	1.0
0.0	0.9930	3100	0.0000	1.0