junk

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

Loss: 8.1252

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 30
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
10.42	1.25	5	10.1940
10.1087	2.5	10	9.7539
9.7572	3.75	15	9.4707
9.5321	5.0	20	9.2852
9.13	6.25	25	9.1155
8.9989	7.5	30	8.9138
8.7422	8.75	35	8.7181
8.5133	10.0	40	8.5220
8.0836	11.25	45	8.3687
7.8212	12.5	50	8.2344
7.6616	13.75	55	8.1437
7.4743	15.0	60	8.0750
7.1668	16.25	65	8.0275
7.0485	17.5	70	7.9937
6.9619	18.75	75	7.9525
6.8705	20.0	80	7.9584
6.6232	21.25	85	7.9238
6.6423	22.5	90	7.9155
6.5876	23.75	95	7.9088
6.5075	25.0	100	7.9154
6.4218	26.25	105	7.8957
6.2857	27.5	110	7.9040
6.1833	28.75	115	7.9092
6.1263	30.0	120	7.9198
6.0123	31.25	125	7.9103
5.9111	32.5	130	7.9150
5.9157	33.75	135	7.9178
5.8237	35.0	140	7.9479
5.6626	36.25	145	7.9358
5.657	37.5	150	7.9548
5.5894	38.75	155	7.9572
5.5157	40.0	160	7.9800
5.4606	41.25	165	7.9481
5.2962	42.5	170	7.9568
5.2877	43.75	175	7.9720
5.2395	45.0	180	7.9709
5.1394	46.25	185	7.9900
5.0096	47.5	190	8.0010
4.9646	48.75	195	8.0105
4.973	50.0	200	8.0182
4.866	51.25	205	8.0310
4.8044	52.5	210	8.0372
4.7804	53.75	215	8.0387
4.7187	55.0	220	8.0166
4.6399	56.25	225	8.0598
4.6644	57.5	230	8.0465
4.5318	58.75	235	8.0482
4.4451	60.0	240	8.0538
4.4442	61.25	245	8.0473
4.3778	62.5	250	8.0517
4.4453	63.75	255	8.0740
4.3813	65.0	260	8.0658
4.2654	66.25	265	8.0764
4.2278	67.5	270	8.0737
4.2212	68.75	275	8.0952
4.1481	70.0	280	8.0877
4.162	71.25	285	8.0882
4.077	72.5	290	8.0813
4.0134	73.75	295	8.0862
3.9975	75.0	300	8.0980
3.9174	76.25	305	8.0989
3.9748	77.5	310	8.0903
3.9362	78.75	315	8.1109
3.8585	80.0	320	8.1049
3.8832	81.25	325	8.1076
3.8799	82.5	330	8.1078
3.8354	83.75	335	8.1073
3.8073	85.0	340	8.1182
3.8701	86.25	345	8.1179
3.7696	87.5	350	8.1204
3.7907	88.75	355	8.1187
3.7428	90.0	360	8.1172
3.7048	91.25	365	8.1201
3.724	92.5	370	8.1205
3.7308	93.75	375	8.1191
3.7665	95.0	380	8.1211
3.6804	96.25	385	8.1244
3.6001	97.5	390	8.1220
3.6411	98.75	395	8.1245
3.6321	100.0	400	8.1252

Framework versions

Transformers 4.40.2
Pytorch 2.3.0
Datasets 2.19.1
Tokenizers 0.19.1

Anish13
/

junk

junk

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results