starcoderbase7b_2048_context_length_lr_0.0005

This model is a fine-tuned version of bigcode/starcoderbase-7b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.0501

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 16
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 30
training_steps: 2000

Training results

Training Loss	Epoch	Step	Validation Loss
0.6244	0.0125	25	0.5402
1.0172	0.025	50	1.4486
0.9991	0.0375	75	1.0535
0.715	0.05	100	1.6262
0.6957	0.0625	125	0.6796
0.5182	0.075	150	0.6086
0.497	0.0875	175	0.5938
0.4611	0.1	200	0.6104
0.4046	0.1125	225	0.5857
0.3753	0.125	250	0.6633
0.3517	0.1375	275	0.6479
0.2758	0.15	300	0.5788
0.2928	0.1625	325	0.6429
0.2669	0.175	350	0.5874
0.2608	0.1875	375	0.5497
0.2049	0.2	400	0.6268
0.2006	0.2125	425	0.6265
0.197	0.225	450	0.6236
0.177	0.2375	475	0.6124
0.1774	0.25	500	0.6231
0.1509	0.2625	525	0.5864
0.1389	0.275	550	0.6161
0.8679	0.2875	575	11.4657
6.5575	0.3	600	6.4917
6.0031	0.3125	625	5.5229
5.1391	0.325	650	5.2191
4.4917	0.3375	675	4.6562
3.9199	0.35	700	4.2153
3.855	0.3625	725	4.0902
3.5441	0.375	750	4.0601
3.3835	0.3875	775	3.8844
3.1663	0.4	800	3.8223
2.9285	0.4125	825	3.4541
3.0088	0.425	850	3.5302
2.9083	0.4375	875	3.3347
2.8438	0.45	900	3.3962
2.663	0.4625	925	3.0955
2.5084	0.475	950	3.0454
2.5818	0.4875	975	3.0131
2.4068	0.5	1000	3.0179
2.3994	0.5125	1025	2.8273
2.1942	0.525	1050	2.7333
2.1041	0.5375	1075	2.6163
2.0861	0.55	1100	2.6006
1.9868	0.5625	1125	2.5482
1.9496	0.575	1150	2.6079
1.8099	0.5875	1175	2.3777
1.6454	0.6	1200	2.2547
1.6484	0.6125	1225	2.3254
1.5729	0.625	1250	2.2835
1.5635	0.6375	1275	2.2167
1.3961	0.65	1300	2.2751
1.3495	0.6625	1325	2.1755
1.3524	0.675	1350	2.1377
1.3116	0.6875	1375	2.1407
1.282	0.7	1400	2.0955
1.114	0.7125	1425	2.0334
1.0985	0.725	1450	2.0133
1.1216	0.7375	1475	2.0139
1.0544	0.75	1500	2.0464
1.0221	0.7625	1525	1.9984
0.9368	0.775	1550	2.0069
0.8973	0.7875	1575	1.9595
0.9332	0.8	1600	1.9372
0.9227	0.8125	1625	1.9910
0.8507	0.825	1650	2.0251
0.8242	0.8375	1675	1.9892
0.7571	0.85	1700	2.0327
0.7519	0.8625	1725	1.9949
0.7209	0.875	1750	2.0050
0.7315	0.8875	1775	2.0076
0.77	0.9	1800	2.0315
0.7719	0.9125	1825	2.0241
0.681	0.925	1850	2.0440
0.7371	0.9375	1875	2.0380
0.6823	0.95	1900	2.0392
0.6891	0.9625	1925	2.0563
0.7266	0.975	1950	2.0511
0.6888	0.9875	1975	2.0501
0.6663	1.0	2000	2.0501

Framework versions

Transformers 4.44.0
Pytorch 2.4.0a0+07cecf4168.nv24.05
Datasets 2.20.0
Tokenizers 0.19.1

dafrimi
/

starcoderbase7b_2048_context_length_lr_0.0005

starcoderbase7b_2048_context_length_lr_0.0005

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for dafrimi/starcoderbase7b_2048_context_length_lr_0.0005

Evaluation results