---
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
library_name: peft
license: llama3.1
tags:
- axolotl
- generated_from_trainer
model-index:
- name: EvolCodeLlama-3.1-8B-Instruct
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
<details><summary>See axolotl config</summary>

axolotl version: `0.4.1`
```yaml
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
is_llama_derived_model: true
hub_model_id: EvolCodeLlama-3.1-8B-Instruct

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: mlabonne/Evol-Instruct-Python-1k
    type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.02
output_dir: ./qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 2048
sample_packing: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: axolotl
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
eval_steps: 0.01
save_strategy: epoch
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: "<|end_of_text|>"

```

</details><br>

# EvolCodeLlama-3.1-8B-Instruct

This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4057

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 0.388         | 0.0120 | 1    | 0.4443          |
| 0.3646        | 0.0359 | 3    | 0.4441          |
| 0.3216        | 0.0719 | 6    | 0.4439          |
| 0.3628        | 0.1078 | 9    | 0.4435          |
| 0.2506        | 0.1437 | 12   | 0.4417          |
| 0.2855        | 0.1796 | 15   | 0.4379          |
| 0.2472        | 0.2156 | 18   | 0.4310          |
| 0.3146        | 0.2515 | 21   | 0.4243          |
| 0.2829        | 0.2874 | 24   | 0.4185          |
| 0.2926        | 0.3234 | 27   | 0.4139          |
| 0.3832        | 0.3593 | 30   | 0.4099          |
| 0.3           | 0.3952 | 33   | 0.4069          |
| 0.2759        | 0.4311 | 36   | 0.4051          |
| 0.341         | 0.4671 | 39   | 0.4017          |
| 0.2268        | 0.5030 | 42   | 0.3989          |
| 0.3938        | 0.5389 | 45   | 0.3971          |
| 0.3478        | 0.5749 | 48   | 0.3951          |
| 0.2745        | 0.6108 | 51   | 0.3935          |
| 0.2623        | 0.6467 | 54   | 0.3920          |
| 0.3743        | 0.6826 | 57   | 0.3903          |
| 0.3205        | 0.7186 | 60   | 0.3898          |
| 0.332         | 0.7545 | 63   | 0.3897          |
| 0.268         | 0.7904 | 66   | 0.3876          |
| 0.2842        | 0.8263 | 69   | 0.3873          |
| 0.3677        | 0.8623 | 72   | 0.3868          |
| 0.212         | 0.8982 | 75   | 0.3857          |
| 0.2656        | 0.9341 | 78   | 0.3854          |
| 0.2499        | 0.9701 | 81   | 0.3844          |
| 0.3512        | 1.0060 | 84   | 0.3850          |
| 0.3069        | 1.0269 | 87   | 0.3848          |
| 0.3037        | 1.0629 | 90   | 0.3856          |
| 0.2785        | 1.0988 | 93   | 0.3864          |
| 0.206         | 1.1347 | 96   | 0.3873          |
| 0.3354        | 1.1707 | 99   | 0.3912          |
| 0.3281        | 1.2066 | 102  | 0.3882          |
| 0.3452        | 1.2425 | 105  | 0.3849          |
| 0.3153        | 1.2784 | 108  | 0.3851          |
| 0.3846        | 1.3144 | 111  | 0.3851          |
| 0.2847        | 1.3503 | 114  | 0.3842          |
| 0.3128        | 1.3862 | 117  | 0.3842          |
| 0.282         | 1.4222 | 120  | 0.3866          |
| 0.2186        | 1.4581 | 123  | 0.3876          |
| 0.2122        | 1.4940 | 126  | 0.3862          |
| 0.2877        | 1.5299 | 129  | 0.3837          |
| 0.2771        | 1.5659 | 132  | 0.3822          |
| 0.3518        | 1.6018 | 135  | 0.3820          |
| 0.302         | 1.6377 | 138  | 0.3829          |
| 0.2653        | 1.6737 | 141  | 0.3833          |
| 0.3281        | 1.7096 | 144  | 0.3832          |
| 0.2933        | 1.7455 | 147  | 0.3821          |
| 0.1959        | 1.7814 | 150  | 0.3824          |
| 0.2013        | 1.8174 | 153  | 0.3830          |
| 0.1909        | 1.8533 | 156  | 0.3824          |
| 0.2321        | 1.8892 | 159  | 0.3812          |
| 0.2695        | 1.9251 | 162  | 0.3798          |
| 0.2516        | 1.9611 | 165  | 0.3796          |
| 0.2148        | 1.9970 | 168  | 0.3796          |
| 0.2233        | 2.0180 | 171  | 0.3802          |
| 0.234         | 2.0539 | 174  | 0.3844          |
| 0.2615        | 2.0898 | 177  | 0.3938          |
| 0.1582        | 2.1257 | 180  | 0.4031          |
| 0.218         | 2.1617 | 183  | 0.4071          |
| 0.2438        | 2.1976 | 186  | 0.4072          |
| 0.1822        | 2.2335 | 189  | 0.4050          |
| 0.2163        | 2.2695 | 192  | 0.4028          |
| 0.1513        | 2.3054 | 195  | 0.4021          |
| 0.1898        | 2.3413 | 198  | 0.4031          |
| 0.1857        | 2.3772 | 201  | 0.4059          |
| 0.1909        | 2.4132 | 204  | 0.4075          |
| 0.1119        | 2.4491 | 207  | 0.4092          |
| 0.1794        | 2.4850 | 210  | 0.4091          |
| 0.1188        | 2.5210 | 213  | 0.4081          |
| 0.1525        | 2.5569 | 216  | 0.4073          |
| 0.1897        | 2.5928 | 219  | 0.4069          |
| 0.1785        | 2.6287 | 222  | 0.4064          |
| 0.169         | 2.6647 | 225  | 0.4064          |
| 0.1518        | 2.7006 | 228  | 0.4060          |
| 0.1896        | 2.7365 | 231  | 0.4052          |
| 0.1675        | 2.7725 | 234  | 0.4055          |
| 0.2193        | 2.8084 | 237  | 0.4055          |
| 0.1887        | 2.8443 | 240  | 0.4057          |
| 0.1639        | 2.8802 | 243  | 0.4055          |
| 0.1701        | 2.9162 | 246  | 0.4058          |
| 0.2019        | 2.9521 | 249  | 0.4057          |


### Framework versions

- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1