Visualize in Weights & Biases

CodeLlama-34b-Instruct-sft-5e-3-epoch-100-xsum

This model is a fine-tuned version of meta-llama/CodeLlama-34b-Instruct-hf on the meng-lab/CodeLlama-34B-Instruct-xsum dataset. It achieves the following results on the evaluation set:

  • Loss: 5.3547
  • Loss Layer 6 Head: 1.5863
  • Loss Layer 12 Head: 1.2384
  • Loss Layer 18 Head: 1.0729
  • Loss Layer 24 Head: 0.6857
  • Loss Layer 30 Head: 0.4438
  • Loss Layer 36 Head: 0.2842
  • Loss Layer 42 Head: 0.1685

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.005
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Loss Layer 6 Head Loss Layer 12 Head Loss Layer 18 Head Loss Layer 24 Head Loss Layer 30 Head Loss Layer 36 Head Loss Layer 42 Head
5.7055 2.56 200 6.7354 1.7854 1.4923 1.4206 0.8735 0.6246 0.4658 0.4023
4.2132 5.12 400 6.1546 1.7830 1.3957 1.1440 0.7581 0.6526 0.3538 0.2167
4.081 7.68 600 6.0643 1.6946 1.4230 1.1566 0.8413 0.5291 0.3589 0.2186
3.5585 10.24 800 5.8829 1.6599 1.3383 1.1385 0.7602 0.5903 0.3677 0.2221
3.5251 12.8 1000 5.7000 1.6490 1.2994 1.0979 0.7252 0.5119 0.3438 0.2164
3.1679 15.36 1200 5.6536 1.6224 1.2553 1.1685 0.7247 0.5292 0.3125 0.1873
3.2193 17.92 1400 5.5506 1.5900 1.2721 1.0925 0.7382 0.4849 0.3224 0.1969
3.0832 20.48 1600 5.5640 1.5978 1.2975 1.1012 0.7319 0.4884 0.3065 0.1891
2.9621 23.04 1800 5.5682 1.6054 1.2700 1.1180 0.7373 0.4615 0.2985 0.2074
3.0878 25.6 2000 5.7224 1.6020 1.4047 1.1298 0.7446 0.4841 0.3109 0.1890
2.8619 28.16 2200 5.5169 1.5917 1.2565 1.0982 0.7340 0.4624 0.3221 0.2038
2.9146 30.72 2400 5.4960 1.6334 1.2661 1.0884 0.7008 0.4590 0.3066 0.1775
2.8805 33.28 2600 5.7326 1.7120 1.2473 1.1268 0.8572 0.5254 0.3132 0.1889
2.8492 35.84 2800 5.5193 1.6050 1.2626 1.0868 0.7980 0.4569 0.2897 0.1967
2.7414 38.4 3000 5.5041 1.5895 1.2722 1.1454 0.6997 0.4646 0.2958 0.1719
2.8092 40.96 3200 5.4876 1.5899 1.2512 1.0805 0.7123 0.4602 0.3544 0.1739
2.5986 43.52 3400 5.4265 1.5933 1.2407 1.0890 0.6999 0.4719 0.2914 0.1743
2.5645 46.08 3600 5.4640 1.5893 1.2546 1.0868 0.7156 0.4573 0.3096 0.1809
2.6286 48.64 3800 5.4074 1.5805 1.2430 1.0898 0.6973 0.4577 0.2949 0.1757
2.5402 51.2 4000 5.4498 1.6051 1.2551 1.0857 0.7044 0.4704 0.2965 0.1833
2.6027 53.76 4200 5.5040 1.6330 1.2577 1.0813 0.7198 0.5051 0.3221 0.1834
2.4852 56.32 4400 5.4356 1.5925 1.2526 1.0858 0.7114 0.4580 0.2926 0.1861
2.4804 58.88 4600 5.4179 1.5895 1.2417 1.0782 0.7668 0.4488 0.2870 0.1708
2.4591 61.44 4800 5.3843 1.5925 1.2437 1.0750 0.6884 0.4509 0.2912 0.1708
2.4773 64.0 5000 5.4038 1.5952 1.2450 1.0797 0.6915 0.4486 0.2933 0.1994
2.4562 66.56 5200 5.3922 1.5918 1.2485 1.0776 0.6968 0.4479 0.2871 0.1696
2.3506 69.12 5400 5.3768 1.5882 1.2454 1.0791 0.6869 0.4474 0.2867 0.1710
2.4044 71.68 5600 5.3605 1.5856 1.2385 1.0739 0.6914 0.4472 0.2856 0.1700
2.3106 74.24 5800 5.4110 1.5956 1.2418 1.0776 0.6972 0.4813 0.2891 0.1908
2.3976 76.8 6000 5.3686 1.5894 1.2410 1.0754 0.6877 0.4455 0.2856 0.1685
2.2507 79.36 6200 5.3727 1.5923 1.2414 1.0760 0.6877 0.4455 0.2852 0.1701
2.3297 81.92 6400 5.3620 1.5871 1.2407 1.0748 0.6867 0.4443 0.2855 0.1686
2.2224 84.48 6600 5.3621 1.5881 1.2408 1.0751 0.6865 0.4444 0.2846 0.1687
2.2312 87.04 6800 5.3594 1.5863 1.2400 1.0735 0.6862 0.4446 0.2846 0.1689
2.2597 89.6 7000 5.3562 1.5858 1.2387 1.0732 0.6860 0.4440 0.2844 0.1684
2.201 92.16 7200 5.3562 1.5867 1.2387 1.0733 0.6861 0.4438 0.2842 0.1684
2.2423 94.72 7400 5.3539 1.5862 1.2380 1.0726 0.6856 0.4438 0.2842 0.1686
2.2145 97.28 7600 5.3546 1.5863 1.2384 1.0728 0.6857 0.4437 0.2842 0.1686
2.2007 99.84 7800 5.3547 1.5863 1.2384 1.0729 0.6857 0.4438 0.2842 0.1685

Framework versions

  • Transformers 4.43.2
  • Pytorch 2.1.2
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
15
Safetensors
Model size
34.2B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for meng-lab/codellama_34b_instruct_paradec_xsum

Finetuned
(3)
this model

Collection including meng-lab/codellama_34b_instruct_paradec_xsum