yuvraj17's picture
Update README.md
2d0ba28 verified
|
raw
history blame
7.45 kB
metadata
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
library_name: peft
license: llama3.1
tags:
  - axolotl
  - generated_from_trainer
model-index:
  - name: EvolCodeLlama-3.1-8B-Instruct
    results: []

Built with Axolotl

See axolotl config

axolotl version: 0.4.1

base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
is_llama_derived_model: true
hub_model_id: EvolCodeLlama-3.1-8B-Instruct

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: mlabonne/Evol-Instruct-Python-1k
    type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.02
output_dir: ./qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 2048
sample_packing: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: axolotl
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
eval_steps: 0.01
save_strategy: epoch
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: "<|end_of_text|>"

EvolCodeLlama-3.1-8B-Instruct

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct using QLoRA (4-bit precision) on the mlabonne/Evol-Instruct-Python-1k dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4057

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
0.388 0.0120 1 0.4443
0.3646 0.0359 3 0.4441
0.3216 0.0719 6 0.4439
0.3628 0.1078 9 0.4435
0.2506 0.1437 12 0.4417
0.2855 0.1796 15 0.4379
0.2472 0.2156 18 0.4310
0.3146 0.2515 21 0.4243
0.2829 0.2874 24 0.4185
0.2926 0.3234 27 0.4139
0.3832 0.3593 30 0.4099
0.3 0.3952 33 0.4069
0.2759 0.4311 36 0.4051
0.341 0.4671 39 0.4017
0.2268 0.5030 42 0.3989
0.3938 0.5389 45 0.3971
0.3478 0.5749 48 0.3951
0.2745 0.6108 51 0.3935
0.2623 0.6467 54 0.3920
0.3743 0.6826 57 0.3903
0.3205 0.7186 60 0.3898
0.332 0.7545 63 0.3897
0.268 0.7904 66 0.3876
0.2842 0.8263 69 0.3873
0.3677 0.8623 72 0.3868
0.212 0.8982 75 0.3857
0.2656 0.9341 78 0.3854
0.2499 0.9701 81 0.3844
0.3512 1.0060 84 0.3850
0.3069 1.0269 87 0.3848
0.3037 1.0629 90 0.3856
0.2785 1.0988 93 0.3864
0.206 1.1347 96 0.3873
0.3354 1.1707 99 0.3912
0.3281 1.2066 102 0.3882
0.3452 1.2425 105 0.3849
0.3153 1.2784 108 0.3851
0.3846 1.3144 111 0.3851
0.2847 1.3503 114 0.3842
0.3128 1.3862 117 0.3842
0.282 1.4222 120 0.3866
0.2186 1.4581 123 0.3876
0.2122 1.4940 126 0.3862
0.2877 1.5299 129 0.3837
0.2771 1.5659 132 0.3822
0.3518 1.6018 135 0.3820
0.302 1.6377 138 0.3829
0.2653 1.6737 141 0.3833
0.3281 1.7096 144 0.3832
0.2933 1.7455 147 0.3821
0.1959 1.7814 150 0.3824
0.2013 1.8174 153 0.3830
0.1909 1.8533 156 0.3824
0.2321 1.8892 159 0.3812
0.2695 1.9251 162 0.3798
0.2516 1.9611 165 0.3796
0.2148 1.9970 168 0.3796
0.2233 2.0180 171 0.3802
0.234 2.0539 174 0.3844
0.2615 2.0898 177 0.3938
0.1582 2.1257 180 0.4031
0.218 2.1617 183 0.4071
0.2438 2.1976 186 0.4072
0.1822 2.2335 189 0.4050
0.2163 2.2695 192 0.4028
0.1513 2.3054 195 0.4021
0.1898 2.3413 198 0.4031
0.1857 2.3772 201 0.4059
0.1909 2.4132 204 0.4075
0.1119 2.4491 207 0.4092
0.1794 2.4850 210 0.4091
0.1188 2.5210 213 0.4081
0.1525 2.5569 216 0.4073
0.1897 2.5928 219 0.4069
0.1785 2.6287 222 0.4064
0.169 2.6647 225 0.4064
0.1518 2.7006 228 0.4060
0.1896 2.7365 231 0.4052
0.1675 2.7725 234 0.4055
0.2193 2.8084 237 0.4055
0.1887 2.8443 240 0.4057
0.1639 2.8802 243 0.4055
0.1701 2.9162 246 0.4058
0.2019 2.9521 249 0.4057

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1