cllm-0.0.2

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 256
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 50
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss
4.8419	0.0214	500	4.7291
3.891	0.0429	1000	3.8792
3.5798	0.0643	1500	3.5656
3.3861	0.0858	2000	3.4057
3.2754	0.1072	2500	3.2925
3.2039	0.1286	3000	3.2109
3.1475	0.1501	3500	3.1513
3.0936	0.1715	4000	3.0991
3.0483	0.1930	4500	3.0603
3.0036	0.2144	5000	3.0180
2.9644	0.2358	5500	2.9900
2.9374	0.2573	6000	2.9599
2.901	0.2787	6500	2.9334
2.8968	0.3002	7000	2.9124
2.866	0.3216	7500	2.8889
2.8614	0.3430	8000	2.8672
2.8378	0.3645	8500	2.8489
2.8242	0.3859	9000	2.8290
2.7961	0.4074	9500	2.8133
2.769	0.4288	10000	2.7962
2.7619	0.4502	10500	2.7804
2.7527	0.4717	11000	2.7687
2.7457	0.4931	11500	2.7540
2.7119	0.5146	12000	2.7441
2.7089	0.5360	12500	2.7317
2.7236	0.5574	13000	2.7218
2.6984	0.5789	13500	2.7102
2.6791	0.6003	14000	2.6998
2.6764	0.6218	14500	2.6915
2.6663	0.6432	15000	2.6806
2.6424	0.6646	15500	2.6720
2.6384	0.6861	16000	2.6612
2.6343	0.7075	16500	2.6536
2.6303	0.7290	17000	2.6471
2.6115	0.7504	17500	2.6373
2.6125	0.7718	18000	2.6310
2.5983	0.7933	18500	2.6246
2.6043	0.8147	19000	2.6173
2.5876	0.8362	19500	2.6106
2.5824	0.8576	20000	2.6043
2.5802	0.8790	20500	2.5983
2.5772	0.9005	21000	2.5927
2.5584	0.9219	21500	2.5878
2.5652	0.9434	22000	2.5835
2.5593	0.9648	22500	2.5794
2.5547	0.9862	23000	2.5767