---
library_name: peft
tags:
- generated_from_trainer
base_model: /GenAI4HW/llama2_13b
metrics:
- accuracy
model-index:
- name: outputs/llama2-13B-lora-QuArch_0_1_1_alpaca_filtered-answer-context-test-new
results: []
---
[](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config
axolotl version: `0.4.0`
```yaml
## General
# base_model: meta-llama/Meta-Llama-3-8B-Instruct
base_model: /GenAI4HW/llama2_13b
# base_model: meta-llama/Llama-2-13b
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
# tokenizer_type: LlamaTokenizer
output_dir: ./outputs/llama2-13B-lora-QuArch_0_1_1_alpaca_filtered-answer-context-test-new
seed: 42
## Data Configuration
datasets:
# - path: ./data/QuArch_v0_1_0_alpaca_w_context.json # With abstract
# - path: ./data/QuArch_v0_1_1_alpaca_format.json # With justification
# - path: ./data/QuArch_v0_1_0_alpaca_mmlu.json # Without justification
- path: ./data/QuArch_v0_1_1_alpaca_filtered_context/
type: alpaca
data_file: train
dataset_prepared_path:
test_datasets:
- path: ./data/QuArch_v0_1_1_alpaca_filtered_context/
type: alpaca
split: test
data_file:
- test
# - path: ./data/QuArch_v0_1_1_alpaca_filtered_context/
# type: alpaca
# split: val
# data_file:
# - val
## Model Configuration
load_in_8bit: false
load_in_4bit: false
strict: false
bf16: auto
fp16:
tf32: false
device_map: 'auto'
## LoRA Configuration
adapter: lora
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_model_dir:
lora_fan_in_fan_out:
## Logging Configuration
logging_dir: ./logs
logging_steps: 10
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
do_eval: true
## Training Configuration
sequence_len: 1024
sample_packing: true
pad_to_sequence_len: true
train_on_inputs: false
group_by_length: false
micro_batch_size: 1
gradient_accumulation_steps: 16
num_epochs: 30
warmup_steps: 10
weight_decay: 0.01
optimizer: adamw_torch
lr_scheduler: linear
learning_rate: 2e-5
gradient_checkpointing: false
saves_per_epoch: 1
# save_steps: 0
# save_strategy: steps
save_total_limit: 30
load_best_model_at_end: true
greater_is_better: true
early_stopping_patience:
resume_from_checkpoint:
remove_unused_columns: true
## Evaluation Configuration
eval_sample_packing: False
eval_batch_size: 1
evals_per_epoch: 1
# evaluation_strategy: epoch
eval_max_new_tokens: 32
eval_table_size:
# max_new_token: 32
# eval_causal_lm_metrics: sacrebleu
# Others
local_rank:
xformers_attention:
flash_attention: true
s2_attention:
debug:
deepspeed:
fsdp:
fsdp_config:
special_tokens:
# pad_token: <|end_of_text|>
```
# outputs/llama2-13B-lora-QuArch_0_1_1_alpaca_filtered-answer-context-test-new
This model was trained from scratch on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0432
- Accuracy: 0.9808
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 32
- total_eval_batch_size: 2
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 10
- num_epochs: 30
### Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|:-------------:|:-------:|:----:|:---------------:|:--------:|
| No log | 0.2105 | 1 | 5.1322 | 0.6154 |
| No log | 0.8421 | 4 | 5.1271 | 0.6346 |
| No log | 1.6842 | 8 | 5.0601 | 0.6538 |
| 5.1323 | 2.5263 | 12 | 4.7743 | 0.7885 |
| 5.1323 | 3.3684 | 16 | 4.0491 | 0.9231 |
| 4.2735 | 4.2105 | 20 | 2.6444 | 0.8846 |
| 4.2735 | 5.0526 | 24 | 1.0551 | 0.9615 |
| 4.2735 | 5.8947 | 28 | 0.4698 | 0.6923 |
| 1.2232 | 6.7368 | 32 | 0.3224 | 0.6731 |
| 1.2232 | 7.5789 | 36 | 0.2527 | 1.0 |
| 0.3083 | 8.4211 | 40 | 0.1972 | 1.0 |
| 0.3083 | 9.2632 | 44 | 0.1372 | 0.9615 |
| 0.3083 | 10.1053 | 48 | 0.0803 | 1.0 |
| 0.1761 | 10.9474 | 52 | 0.0575 | 0.9808 |
| 0.1761 | 11.7895 | 56 | 0.0475 | 0.9808 |
| 0.116 | 12.6316 | 60 | 0.0444 | 0.9808 |
| 0.116 | 13.4737 | 64 | 0.0463 | 0.9808 |
| 0.116 | 14.3158 | 68 | 0.0489 | 0.9808 |
| 0.0814 | 15.1579 | 72 | 0.0495 | 0.9808 |
| 0.0814 | 16.0 | 76 | 0.0481 | 0.9808 |
| 0.0709 | 16.8421 | 80 | 0.0469 | 0.9808 |
| 0.0709 | 17.6842 | 84 | 0.0457 | 0.9808 |
| 0.0709 | 18.5263 | 88 | 0.0455 | 0.9808 |
| 0.0632 | 19.3684 | 92 | 0.0454 | 0.9808 |
| 0.0632 | 20.2105 | 96 | 0.0459 | 0.9808 |
| 0.0569 | 21.0526 | 100 | 0.0458 | 0.9808 |
| 0.0569 | 21.8947 | 104 | 0.0446 | 0.9808 |
| 0.0569 | 22.7368 | 108 | 0.0451 | 0.9808 |
| 0.055 | 23.5789 | 112 | 0.0446 | 0.9808 |
| 0.055 | 24.4211 | 116 | 0.0452 | 0.9808 |
| 0.0581 | 25.2632 | 120 | 0.0432 | 0.9808 |
### Framework versions
- PEFT 0.10.0
- Transformers 4.41.2
- Pytorch 2.1.2+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1