Model description
The model can be used for Swahili language generation, translation, and other NLP tasks, especially focused on the pretraining and fine-tuning domains. It has been pre-trained and fine-tuned specifically for Swahili language tasks with the Unsloth framework.
This is a development version and it's not recommended for general use.
- Developed by: calcpy
- License: apache-2.0
- Finetuned from model : unsloth/llama-3.2-3b-instruct-bnb-4bit
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
Out-of-Scope Use
The model is not designed for tasks outside of the Swahili language or tasks requiring highly factual precision in domains not covered by the training datasets.
Bias, Risks, and Limitations
The model inherits any potential biases present in the Swahili Wikipedia and Mollel's dataset. Users should be cautious when applying this model to sensitive applications.
Recommendations
Users should perform bias evaluations specific to their use case and ensure that any downstream applications consider potential ethical implications.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("path_to_your_model")
tokenizer = AutoTokenizer.from_pretrained("path_to_your_model")
# Example inference
instruction = "Endelea mlolongo wa fibonacci:"
input_data = "1, 1, 2, 3, 5, 8,"
prompt = f"Chini ni maagizo ambayo yanaelezea kazi. Andika jibu ambalo linakamilisha ombi ipasavyo.\n### Maagizo:\n{instruction}\n\n{input_data}\n### Jibu:\n"
inputs = tokenizer([f"{prompt}"], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
In this example, the model generates the continuation of the Fibonacci sequence in Swahili.
Training Hyperparameters
- ** Training regime: Mixed precision (fp16/bf16)
- ** Batch size: 2 per device
- ** Max steps: 24,000 for pretraining, 1,200 for fine-tuning
- ** Learning rate: 5e-5 (1e-5 for embeddings)
- ** Warmup steps: 100 for pretraining, 10 for fine-tuning
- ** Weight decay: 0.01 (pretraining), 0.00 (fine-tuning)
Evaluation
The model was only manually evaluated on the Alpaca Swahili dataset for instruction-following capabilities.
Metrics
Evaluation metrics will be required for language generation quality and instruction-following precision.
Summary
This is a technical release of a small test model to test pre-training and fine-tuning on a single GPU.
Compute Infrastructure
- OS Ubuntu 22.04.5 LTS
- Hardware Type: NVIDIA GeForce RTX 4090 24 GiB
- Hours used: ~12 hours
- Downloads last month
- 6