---
license: other
library_name: peft
tags:
- axolotl
- generated_from_trainer
base_model: NousResearch/Meta-Llama-3-8B
model-index:
- name: llama-3-8B-semeval2014
  results: []
language:
- en
metrics:
- f1
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
<details><summary>See axolotl config</summary>

axolotl version: `0.4.0`
```yaml
base_model: NousResearch/Meta-Llama-3-8B

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: semeval2014_train.jsonl
    ds_type: json
    type:
      # JSONL file contains instruction, input, output fields per line.
      # This gets mapped to the equivalent axolotl tags.
      field_instruction: instruction
      field_input: input
      field_output: output
      # Format is used by axolotl to generate the prompt.
      format: |-
        [INST] {input} [/INST]

tokens: # add new control tokens from the dataset to the model
  - "[INST]"
  - "[/INST]"

dataset_prepared_path:
val_set_size: 0.05
output_dir: ./lora-out

sequence_len: 4096
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: false

adapter: lora
lora_model_dir:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save: # required when adding new tokens to LLaMA/Mistral
  - embed_tokens
  - lm_head

wandb_project: absa-semeval2014
wandb_entity: psimm
wandb_log_model:
wandb_name: llama-3-8B-semeval2014

hub_model_id: psimm/llama-3-8B-semeval2014

gradient_accumulation_steps: 1
micro_batch_size: 32
num_epochs: 4
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.0001

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
eval_steps: 0.05
eval_table_size:
eval_table_max_new_tokens: 128
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>

```

</details><br>

# llama-3-8B-semeval2014

This model is a fine-tuned version of [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) on the SemEval2014 Task 4 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0695
- F1 Score: 82.13

For more details, see my [article](https://simmering.dev/open-absa)

## Intended uses & limitations

Aspect-based sentiment analysis in English. Pass it review sentences wrapped in tags, like this: [INST]The cheeseburger was tasty but the fries were soggy.[/INST]

## How to run

This adapter requires that two new tokens are added to the tokenizer. The tokens are: "[INST]" and "[/INST]". Also, the base model's embedding layer size has to be increased by 2.

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

extra_tokens = ["[INST]", "[/INST]"]
base_model = "NousResearch/Meta-Llama-3-8B"

base_model = AutoModelForCausalLM.from_pretrained("NousResearch/Meta-Llama-3-8B")
base_model.resize_token_embeddings(base_model.config.vocab_size + len(extra_tokens))

tokenizer = AutoTokenizer.from_pretrained("NousResearch/Meta-Llama-3-8B")

tokenizer.add_special_tokens({"additional_special_tokens": extra_tokens})

model = PeftModel.from_pretrained(base_model, "psimm/llama-3-8B-semeval2014")

input_text = "[INST]The food was tasty[/INST]"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

gen_tokens = model.generate(
    input_ids,
    max_length=256,
    temperature=0.01,
)

# Remove the input tokens
output_tokens = gen_tokens[:, input_ids.shape[1] :]

print(tokenizer.batch_decode(output_tokens, skip_special_tokens=True))
```

## Training and evaluation data

SemEval 2014 Task 4 reviews.

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 64
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 4

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 2.5408        | 0.0112 | 1    | 2.2742          |
| 0.1159        | 0.2022 | 18   | 0.1026          |
| 0.1028        | 0.4045 | 36   | 0.0762          |
| 0.0813        | 0.6067 | 54   | 0.0709          |
| 0.0908        | 0.8090 | 72   | 0.0665          |
| 0.0431        | 1.0112 | 90   | 0.0639          |
| 0.0275        | 1.2135 | 108  | 0.0663          |
| 0.0224        | 1.4157 | 126  | 0.0659          |
| 0.0349        | 1.6180 | 144  | 0.0637          |
| 0.0281        | 1.8202 | 162  | 0.0589          |
| 0.0125        | 2.0225 | 180  | 0.0592          |
| 0.0088        | 2.2247 | 198  | 0.0682          |
| 0.0076        | 2.4270 | 216  | 0.0666          |
| 0.01          | 2.6292 | 234  | 0.0654          |
| 0.0131        | 2.8315 | 252  | 0.0704          |
| 0.0075        | 3.0337 | 270  | 0.0679          |
| 0.002         | 3.2360 | 288  | 0.0688          |
| 0.0029        | 3.4382 | 306  | 0.0692          |
| 0.0009        | 3.6404 | 324  | 0.0694          |
| 0.0064        | 3.8427 | 342  | 0.0695          |


### Framework versions

- PEFT 0.10.0
- Transformers 4.40.2
- Pytorch 2.2.2+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1