metadata

base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
library_name: peft
license: llama3.1
tags:
  - axolotl
  - generated_from_trainer
model-index:
  - name: EvolCodeLlama-3.1-8B-Instruct
    results: []

See axolotl config

axolotl version: 0.4.1

base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
is_llama_derived_model: true
hub_model_id: EvolCodeLlama-3.1-8B-Instruct

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: mlabonne/Evol-Instruct-Python-1k
    type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.02
output_dir: ./qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 2048
sample_packing: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: axolotl
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
eval_steps: 0.01
save_strategy: epoch
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: "<|end_of_text|>"

EvolCodeLlama-3.1-8B-Instruct

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.4057

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss
0.388	0.0120	1	0.4443
0.3646	0.0359	3	0.4441
0.3216	0.0719	6	0.4439
0.3628	0.1078	9	0.4435
0.2506	0.1437	12	0.4417
0.2855	0.1796	15	0.4379
0.2472	0.2156	18	0.4310
0.3146	0.2515	21	0.4243
0.2829	0.2874	24	0.4185
0.2926	0.3234	27	0.4139
0.3832	0.3593	30	0.4099
0.3	0.3952	33	0.4069
0.2759	0.4311	36	0.4051
0.341	0.4671	39	0.4017
0.2268	0.5030	42	0.3989
0.3938	0.5389	45	0.3971
0.3478	0.5749	48	0.3951
0.2745	0.6108	51	0.3935
0.2623	0.6467	54	0.3920
0.3743	0.6826	57	0.3903
0.3205	0.7186	60	0.3898
0.332	0.7545	63	0.3897
0.268	0.7904	66	0.3876
0.2842	0.8263	69	0.3873
0.3677	0.8623	72	0.3868
0.212	0.8982	75	0.3857
0.2656	0.9341	78	0.3854
0.2499	0.9701	81	0.3844
0.3512	1.0060	84	0.3850
0.3069	1.0269	87	0.3848
0.3037	1.0629	90	0.3856
0.2785	1.0988	93	0.3864
0.206	1.1347	96	0.3873
0.3354	1.1707	99	0.3912
0.3281	1.2066	102	0.3882
0.3452	1.2425	105	0.3849
0.3153	1.2784	108	0.3851
0.3846	1.3144	111	0.3851
0.2847	1.3503	114	0.3842
0.3128	1.3862	117	0.3842
0.282	1.4222	120	0.3866
0.2186	1.4581	123	0.3876
0.2122	1.4940	126	0.3862
0.2877	1.5299	129	0.3837
0.2771	1.5659	132	0.3822
0.3518	1.6018	135	0.3820
0.302	1.6377	138	0.3829
0.2653	1.6737	141	0.3833
0.3281	1.7096	144	0.3832
0.2933	1.7455	147	0.3821
0.1959	1.7814	150	0.3824
0.2013	1.8174	153	0.3830
0.1909	1.8533	156	0.3824
0.2321	1.8892	159	0.3812
0.2695	1.9251	162	0.3798
0.2516	1.9611	165	0.3796
0.2148	1.9970	168	0.3796
0.2233	2.0180	171	0.3802
0.234	2.0539	174	0.3844
0.2615	2.0898	177	0.3938
0.1582	2.1257	180	0.4031
0.218	2.1617	183	0.4071
0.2438	2.1976	186	0.4072
0.1822	2.2335	189	0.4050
0.2163	2.2695	192	0.4028
0.1513	2.3054	195	0.4021
0.1898	2.3413	198	0.4031
0.1857	2.3772	201	0.4059
0.1909	2.4132	204	0.4075
0.1119	2.4491	207	0.4092
0.1794	2.4850	210	0.4091
0.1188	2.5210	213	0.4081
0.1525	2.5569	216	0.4073
0.1897	2.5928	219	0.4069
0.1785	2.6287	222	0.4064
0.169	2.6647	225	0.4064
0.1518	2.7006	228	0.4060
0.1896	2.7365	231	0.4052
0.1675	2.7725	234	0.4055
0.2193	2.8084	237	0.4055
0.1887	2.8443	240	0.4057
0.1639	2.8802	243	0.4055
0.1701	2.9162	246	0.4058
0.2019	2.9521	249	0.4057

Framework versions

PEFT 0.12.0
Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1