CodeLlama-34b-Instruct-sft-5e-3-epoch-100-xsum

This model is a fine-tuned version of meta-llama/CodeLlama-34b-Instruct-hf on the meng-lab/CodeLlama-34B-Instruct-xsum dataset. It achieves the following results on the evaluation set:

Loss: 5.3547
Loss Layer 6 Head: 1.5863
Loss Layer 12 Head: 1.2384
Loss Layer 18 Head: 1.0729
Loss Layer 24 Head: 0.6857
Loss Layer 30 Head: 0.4438
Loss Layer 36 Head: 0.2842
Loss Layer 42 Head: 0.1685

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 1
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 16
total_train_batch_size: 128
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Loss Layer 6 Head	Loss Layer 12 Head	Loss Layer 18 Head	Loss Layer 24 Head	Loss Layer 30 Head	Loss Layer 36 Head	Loss Layer 42 Head
5.7055	2.56	200	6.7354	1.7854	1.4923	1.4206	0.8735	0.6246	0.4658	0.4023
4.2132	5.12	400	6.1546	1.7830	1.3957	1.1440	0.7581	0.6526	0.3538	0.2167
4.081	7.68	600	6.0643	1.6946	1.4230	1.1566	0.8413	0.5291	0.3589	0.2186
3.5585	10.24	800	5.8829	1.6599	1.3383	1.1385	0.7602	0.5903	0.3677	0.2221
3.5251	12.8	1000	5.7000	1.6490	1.2994	1.0979	0.7252	0.5119	0.3438	0.2164
3.1679	15.36	1200	5.6536	1.6224	1.2553	1.1685	0.7247	0.5292	0.3125	0.1873
3.2193	17.92	1400	5.5506	1.5900	1.2721	1.0925	0.7382	0.4849	0.3224	0.1969
3.0832	20.48	1600	5.5640	1.5978	1.2975	1.1012	0.7319	0.4884	0.3065	0.1891
2.9621	23.04	1800	5.5682	1.6054	1.2700	1.1180	0.7373	0.4615	0.2985	0.2074
3.0878	25.6	2000	5.7224	1.6020	1.4047	1.1298	0.7446	0.4841	0.3109	0.1890
2.8619	28.16	2200	5.5169	1.5917	1.2565	1.0982	0.7340	0.4624	0.3221	0.2038
2.9146	30.72	2400	5.4960	1.6334	1.2661	1.0884	0.7008	0.4590	0.3066	0.1775
2.8805	33.28	2600	5.7326	1.7120	1.2473	1.1268	0.8572	0.5254	0.3132	0.1889
2.8492	35.84	2800	5.5193	1.6050	1.2626	1.0868	0.7980	0.4569	0.2897	0.1967
2.7414	38.4	3000	5.5041	1.5895	1.2722	1.1454	0.6997	0.4646	0.2958	0.1719
2.8092	40.96	3200	5.4876	1.5899	1.2512	1.0805	0.7123	0.4602	0.3544	0.1739
2.5986	43.52	3400	5.4265	1.5933	1.2407	1.0890	0.6999	0.4719	0.2914	0.1743
2.5645	46.08	3600	5.4640	1.5893	1.2546	1.0868	0.7156	0.4573	0.3096	0.1809
2.6286	48.64	3800	5.4074	1.5805	1.2430	1.0898	0.6973	0.4577	0.2949	0.1757
2.5402	51.2	4000	5.4498	1.6051	1.2551	1.0857	0.7044	0.4704	0.2965	0.1833
2.6027	53.76	4200	5.5040	1.6330	1.2577	1.0813	0.7198	0.5051	0.3221	0.1834
2.4852	56.32	4400	5.4356	1.5925	1.2526	1.0858	0.7114	0.4580	0.2926	0.1861
2.4804	58.88	4600	5.4179	1.5895	1.2417	1.0782	0.7668	0.4488	0.2870	0.1708
2.4591	61.44	4800	5.3843	1.5925	1.2437	1.0750	0.6884	0.4509	0.2912	0.1708
2.4773	64.0	5000	5.4038	1.5952	1.2450	1.0797	0.6915	0.4486	0.2933	0.1994
2.4562	66.56	5200	5.3922	1.5918	1.2485	1.0776	0.6968	0.4479	0.2871	0.1696
2.3506	69.12	5400	5.3768	1.5882	1.2454	1.0791	0.6869	0.4474	0.2867	0.1710
2.4044	71.68	5600	5.3605	1.5856	1.2385	1.0739	0.6914	0.4472	0.2856	0.1700
2.3106	74.24	5800	5.4110	1.5956	1.2418	1.0776	0.6972	0.4813	0.2891	0.1908
2.3976	76.8	6000	5.3686	1.5894	1.2410	1.0754	0.6877	0.4455	0.2856	0.1685
2.2507	79.36	6200	5.3727	1.5923	1.2414	1.0760	0.6877	0.4455	0.2852	0.1701
2.3297	81.92	6400	5.3620	1.5871	1.2407	1.0748	0.6867	0.4443	0.2855	0.1686
2.2224	84.48	6600	5.3621	1.5881	1.2408	1.0751	0.6865	0.4444	0.2846	0.1687
2.2312	87.04	6800	5.3594	1.5863	1.2400	1.0735	0.6862	0.4446	0.2846	0.1689
2.2597	89.6	7000	5.3562	1.5858	1.2387	1.0732	0.6860	0.4440	0.2844	0.1684
2.201	92.16	7200	5.3562	1.5867	1.2387	1.0733	0.6861	0.4438	0.2842	0.1684
2.2423	94.72	7400	5.3539	1.5862	1.2380	1.0726	0.6856	0.4438	0.2842	0.1686
2.2145	97.28	7600	5.3546	1.5863	1.2384	1.0728	0.6857	0.4437	0.2842	0.1686
2.2007	99.84	7800	5.3547	1.5863	1.2384	1.0729	0.6857	0.4438	0.2842	0.1685

Framework versions

Transformers 4.43.2
Pytorch 2.1.2
Datasets 3.2.0
Tokenizers 0.19.1

meng-lab
/

codellama_34b_instruct_paradec_xsum

CodeLlama-34b-Instruct-sft-5e-3-epoch-100-xsum

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for meng-lab/codellama_34b_instruct_paradec_xsum

Collection including meng-lab/codellama_34b_instruct_paradec_xsum

AdaDecode

Evaluation results