---
license: apache-2.0
datasets:
- irlab-udc/alpaca_data_galician
language:
- gl
- en
---

# Galician Fine-Tuned LLM Model

This repository contains a large language model (LLM) fine-tuned using the LLaMA Factory library and the Finisterrae III supercomputer at CESGA. The base model used for fine-tuning was Meta's `LLaMA 3`.

## Model Description

This LLM model has been specifically fine-tuned to understand and generate text in Galician. It was fine-tuned using a modified version of the [irlab-udc/alpaca_data_galician](https://huggingface.co/datasets/irlab-udc/alpaca_data_galician) dataset, enriched with synthetic data to enhance its text generation and comprehension capabilities in specific contexts.

### Technical Details

- **Base Model**: Meta's LLaMA 3
- **Fine-Tuning Platform**: LLaMA Factory
- **Infrastructure**: Finisterrae III, CESGA
- **Dataset**: [irlab-udc/alpaca_data_galician](https://huggingface.co/datasets/irlab-udc/alpaca_data_galician) (with modifications)
- **Fine-Tuning Objective**: To improve text comprehension and generation in Galician.

## How to Use the Model

To use this model, follow the example code provided below. Ensure you have the necessary libraries installed (e.g., Hugging Face's `transformers`).

### Installation

```bash
pip install transformers
```

### Installation

```bash
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "abrahammg/Llama3-8B-Galician-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

text = "Enter some text in Galician here."
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```