LLaMA-3.1-8B South African Languages Model

This model card provides detailed information about the LLaMA-3.1-8B model fine-tuned for South African languages. The model demonstrates cost-effective cross-lingual transfer learning for African language processing.

Model Overview

The model is based on Meta's LLaMA-3.1-8B-Instruct architecture and has been fine-tuned on translated versions of the Alpaca Cleaned dataset. The training approach leverages machine translation to create instruction-tuning data in five South African languages, making it a cost-effective solution for multilingual AI development.

Training Methodology

Dataset Preparation

The training data was created by translating the Alpaca Cleaned dataset into five target languages:

  • Xhosa
  • Zulu
  • Tswana
  • Northern Sotho
  • Afrikaans

Machine translation was used to generate the training data, with a cost of $370 per language.

Training Process

The model was trained using the PEFT (Parameter-Efficient Fine-Tuning) library on the Akash Compute Network. Key aspects of the training process include:

  • Single epoch training
  • Multi-GPU distributed training setup
  • Cosine learning rate schedule with 10% warmup
  • Adam optimizer with β1=0.9, β2=0.999, ε=1e-08
  • Total training cost: $15

Performance Evaluation

Evaluation Scope

Current evaluation metrics are available for two languages:

  1. Xhosa (xho)
  2. Zulu (zul)

Evaluation was conducted using three benchmark datasets:

AfriMGSM Results

  • Xhosa: 2.0% accuracy
  • Zulu: 4.5% accuracy

AfriMMIU Results

  • Xhosa: 29.0% accuracy
  • Zulu: 29.0% accuracy

AfriXNLI Results

  • Xhosa: 44.0% accuracy
  • Zulu: 43.0% accuracy

Limitations and Considerations

  1. Evaluation Coverage

    • Only Xhosa and Zulu could be evaluated due to limitations in available benchmarking tools
    • Performance on other supported languages remains unknown
  2. Training Data Quality

    • Reliance on machine translation may impact the quality of training data
    • Potential artifacts or errors from the translation process could affect model performance
  3. Performance Gaps

    • Notably low performance on AfriMGSM tasks indicates room for improvement
    • Further investigation needed to understand performance disparities across tasks

Technical Requirements

The model requires the following framework versions:

  • PyTorch: 2.4.1+cu121
  • Transformers: 4.44.2
  • PEFT: 0.12.0
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model_name = "meta-llama/llama-8b-south-africa"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Example usage for text generation
text = "Translate to Xhosa: Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

License

This model is released under the Apache 2.0 license. The full license text can be found at https://www.apache.org/licenses/LICENSE-2.0.txt

Acknowledgments

  • Meta AI for the base LLaMA-3.1-8B-Instruct model
  • Akash Network for providing computing resources
  • Contributors to the Alpaca Cleaned dataset
  • The African NLP community for benchmark datasets and evaluation tools
Downloads last month
136
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for africa-intelligence/llama-8b-south-africa

Adapter
(747)
this model