metadata
license: apache-2.0
datasets:
- irlab-udc/alpaca_data_galician
language:
- gl
- en
Galician Fine-Tuned LLM Model
This repository contains a large language model (LLM) fine-tuned using the LLaMA Factory library and the Finisterrae III supercomputer at CESGA. The base model used for fine-tuning was Meta's LLaMA 3
.
Model Description
This LLM model has been specifically fine-tuned to understand and generate text in Galician. It was fine-tuned using a modified version of the irlab-udc/alpaca_data_galician dataset, enriched with synthetic data to enhance its text generation and comprehension capabilities in specific contexts.
Technical Details
- Base Model: Meta's LLaMA 3
- Fine-Tuning Platform: LLaMA Factory
- Infrastructure: Finisterrae III, CESGA
- Dataset: irlab-udc/alpaca_data_galician (with modifications)
- Fine-Tuning Objective: To improve text comprehension and generation in Galician.
How to Use the Model
To use this model, follow the example code provided below. Ensure you have the necessary libraries installed (e.g., Hugging Face's transformers
).
Installation
pip install transformers
Installation
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "abrahammg/Llama3-8B-Galician-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
text = "Enter some text in Galician here."
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))