File size: 3,882 Bytes
d24660c 3403bfc d24660c 3403bfc d24660c 039c1c1 d24660c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
---
license: apache-2.0
language:
- sw
base_model:
- google/gemma-2-2b-it
pipeline_tag: text-generation
library_name: transformers
tags:
- swahili
- gemma2
- text-generation-inference
- text-generation
inference:
parameters:
temperature: 0.7
top_p: 0.95
max_new_tokens: 500
do_sample: true
eval_mode: true
model_kwargs:
eval_mode: true
---
# Gemma2-2B-Swahili-IT
Gemma2-2B-Swahili-IT is a lightweight, efficient open variant of Google's Gemma2-2B-IT model, fine-tuned for natural Swahili language understanding and generation. This model provides a resource-efficient option for Swahili language tasks while maintaining strong performance.
## Model Details
- **Developer:** Alfaxad Eyembe
- **Base Model:** google/gemma-2-2b-it
- **Model Type:** Decoder-only transformer
- **Language(s):** Swahili
- **License:** Apache 2.0
- **Finetuning Approach:** Low-Rank Adaptation (LoRA)
## Training Data
The model was fine-tuned on a comprehensive dataset containing:
- 67,017 instruction-response pairs
- 16,273,709 total tokens
- Average 242.83 tokens per example
- High-quality, naturally-written Swahili content
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6375af60e3413701a9f01c0f/7XXsvi8_x5PXZwXcUD-kl.png)
## Performance
### Massive Multitask Language Understanding (MMLU) - Swahili
- Base Model: 31.58% accuracy
- Fine-tuned Model: 38.60% accuracy
- Improvement: +7.02%
### Sentiment Analysis
- Base Model: 84.85% accuracy
- Fine-tuned Model: 86.00% accuracy
- Improvement: +1.15%
- Response Validity: 100%
## Intended Use
This model is designed for:
- Basic Swahili text generation
- Question answering
- Sentiment analysis
- Simple creative writing
- General instruction following in Swahili
- Resource-constrained environments
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("alfaxadeyembe/gemma2-2b-swahili-it")
model = AutoModelForCausalLM.from_pretrained(
"alfaxadeyembe/gemma2-2b-swahili-it",
device_map="auto",
torch_dtype=torch.bfloat16
)
# Always set to eval mode for inference
model.eval()
# Example usage
prompt = "Eleza dhana ya uchumi wa kidijitali na umuhimu wake katika ulimwengu wa leo."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=500,
do_sample=True,
temperature=0.7,
top_p=0.95
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## Training Details
- **Fine-tuning Method:** LoRA
- **Training Steps:** 400
- **Batch Size:** 2
- **Gradient Accumulation Steps:** 32
- **Learning Rate:** 2e-4
- **Training Time:** ~8 hours on A100 GPU
## Key Features
- Lightweight and efficient (2B parameters)
- Suitable for resource-constrained environments
- Good performance on basic language tasks
- Fast inference speed
- Low memory footprint
## Advantages
1. Resource Efficiency:
- Small model size (2B parameters)
- Lower memory requirements
- Faster inference time
- Suitable for deployment on less powerful hardware
2. Task Performance:
- Strong sentiment analysis capabilities
- Decent MMLU performance
- Good instruction following
- Natural Swahili generation
## Limitations
- Simpler responses compared to 9B/27B variants
## Citation
```bibtex
@misc{gemma2-2b-swahili-it,
author = {Alfaxad Eyembe},
title = {Gemma2-2B-Swahili-IT: A Lightweight Swahili Variant of Gemma2-2B-IT},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
}
```
## Contact
For questions or feedback, please reach out through:
- HuggingFace: [@alfaxadeyembe](https://huggingface.co/alfaxad)
- Twitter: [@alfxad](https://twitter.com/alfxad) |