longformer-base-4096-airlines-news-multi-label

This model is a fine-tuned version of kiddothe2b/longformer-base-4096 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.2421
F1: 0.9070
Roc Auc: 0.6668
Hamming: 0.9137

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 9e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 65

Training results

Training Loss	Epoch	Step	Validation Loss	F1	Roc Auc	Hamming
No log	1.0	57	0.3454	0.8319	0.5	0.8850
No log	2.0	114	0.3372	0.8319	0.5	0.8850
No log	3.0	171	0.3353	0.8319	0.5	0.8850
No log	4.0	228	0.3310	0.8319	0.5	0.8850
No log	5.0	285	0.3278	0.8319	0.5	0.8850
No log	6.0	342	0.3242	0.8319	0.5	0.8850
No log	7.0	399	0.3206	0.8319	0.5	0.8850
No log	8.0	456	0.3168	0.8319	0.5	0.8850
0.3599	9.0	513	0.3120	0.8319	0.5	0.8850
0.3599	10.0	570	0.3089	0.8319	0.5	0.8850
0.3599	11.0	627	0.3039	0.8319	0.5	0.8850
0.3599	12.0	684	0.3000	0.8319	0.5	0.8850
0.3599	13.0	741	0.2969	0.8319	0.5	0.8850
0.3599	14.0	798	0.2932	0.8319	0.5	0.8850
0.3599	15.0	855	0.2893	0.8449	0.5064	0.8864
0.3599	16.0	912	0.2859	0.8449	0.5064	0.8864
0.3599	17.0	969	0.2824	0.8449	0.5064	0.8864
0.3111	18.0	1026	0.2800	0.8613	0.5192	0.8894
0.3111	19.0	1083	0.2773	0.8606	0.5160	0.8886
0.3111	20.0	1140	0.2752	0.8586	0.5248	0.8894
0.3111	21.0	1197	0.2727	0.8586	0.5248	0.8894
0.3111	22.0	1254	0.2703	0.8597	0.5280	0.8901
0.3111	23.0	1311	0.2679	0.8761	0.5532	0.8953
0.3111	24.0	1368	0.2665	0.8783	0.5684	0.8975
0.3111	25.0	1425	0.2645	0.8791	0.5688	0.8982
0.3111	26.0	1482	0.2627	0.8789	0.5776	0.8990
0.2854	27.0	1539	0.2611	0.8780	0.5716	0.8982
0.2854	28.0	1596	0.2597	0.8791	0.5688	0.8982
0.2854	29.0	1653	0.2584	0.8818	0.5845	0.9012
0.2854	30.0	1710	0.2570	0.8825	0.5877	0.9019
0.2854	31.0	1767	0.2564	0.8930	0.6405	0.9115
0.2854	32.0	1824	0.2556	0.8913	0.6396	0.9100
0.2854	33.0	1881	0.2547	0.8870	0.6296	0.9071
0.2854	34.0	1938	0.2531	0.8843	0.6029	0.9041
0.2854	35.0	1995	0.2522	0.8912	0.6341	0.9100
0.2722	36.0	2052	0.2516	0.8914	0.6341	0.9100
0.2722	37.0	2109	0.2507	0.8913	0.6369	0.9100
0.2722	38.0	2166	0.2501	0.8899	0.6392	0.9093
0.2722	39.0	2223	0.2491	0.8865	0.6264	0.9063
0.2722	40.0	2280	0.2486	0.8939	0.6409	0.9122
0.2722	41.0	2337	0.2483	0.8921	0.6516	0.9115
0.2722	42.0	2394	0.2474	0.8913	0.6512	0.9108
0.2722	43.0	2451	0.2466	0.8911	0.6341	0.9100
0.2652	44.0	2508	0.2461	0.8950	0.6557	0.9137
0.2652	45.0	2565	0.2459	0.8913	0.6540	0.9108
0.2652	46.0	2622	0.2453	0.8934	0.6521	0.9122
0.2652	47.0	2679	0.2446	0.8950	0.6557	0.9137
0.2652	48.0	2736	0.2445	0.8922	0.6572	0.9115
0.2652	49.0	2793	0.2442	0.8931	0.6521	0.9122
0.2652	50.0	2850	0.2440	0.8938	0.6608	0.9130
0.2652	51.0	2907	0.2436	0.8930	0.6576	0.9122
0.2652	52.0	2964	0.2432	0.8940	0.6553	0.9130
0.2603	53.0	3021	0.2430	0.8940	0.6553	0.9130
0.2603	54.0	3078	0.2428	0.8930	0.6576	0.9122
0.2603	55.0	3135	0.2425	0.8938	0.6608	0.9130
0.2603	56.0	3192	0.2424	0.8904	0.6480	0.9100
0.2603	57.0	3249	0.2424	0.8938	0.6636	0.9130
0.2603	58.0	3306	0.2422	0.8938	0.6636	0.9130
0.2603	59.0	3363	0.2421	0.9070	0.6668	0.9137
0.2603	60.0	3420	0.2419	0.9070	0.6668	0.9137
0.2603	61.0	3477	0.2418	0.8938	0.6636	0.9130
0.2578	62.0	3534	0.2418	0.8938	0.6636	0.9130
0.2578	63.0	3591	0.2416	0.8930	0.6576	0.9122
0.2578	64.0	3648	0.2416	0.8938	0.6608	0.9130
0.2578	65.0	3705	0.2416	0.8930	0.6576	0.9122

Framework versions

Transformers 4.41.1
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

dahe827
/

longformer-base-4096-airlines-news-multi-label

longformer-base-4096-airlines-news-multi-label

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for dahe827/longformer-base-4096-airlines-news-multi-label

Evaluation results