Alfaxad
/

gemma2-2b-swahili-it

+---
+license: apache-2.0
+language:
+- sw
+base_model:
+- google/gemma-2-2b-it
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- swahili
+- gemma2
+- text-generation-inference
+- text-generation
+---
+# Gemma2-2B-Swahili-IT
+Gemma2-2B-Swahili-IT is a lightweight, efficient open variant of Google's Gemma2-2B-IT model, fine-tuned for natural Swahili language understanding and generation. This model provides a resource-efficient option for Swahili language tasks while maintaining strong performance.
+## Model Details
+- **Developer:** Alfaxad Eyembe
+- **Base Model:** google/gemma-2-2b-it
+- **Model Type:** Decoder-only transformer
+- **Language(s):** Swahili
+- **License:** Apache 2.0
+- **Finetuning Approach:** Low-Rank Adaptation (LoRA)
+## Training Data
+The model was fine-tuned on a comprehensive dataset containing:
+- 67,017 instruction-response pairs
+- 16,273,709 total tokens
+- Average 242.83 tokens per example
+- High-quality, naturally-written Swahili content
+## Performance
+### Massive Multitask Language Understanding (MMLU) - Swahili
+- Base Model: 31.58% accuracy
+- Fine-tuned Model: 38.60% accuracy
+- Improvement: +7.02%
+### Sentiment Analysis
+- Base Model: 84.85% accuracy
+- Fine-tuned Model: 86.00% accuracy
+- Improvement: +1.15%
+- Response Validity: 100%
+## Intended Use
+This model is designed for:
+- Basic Swahili text generation
+- Question answering
+- Sentiment analysis
+- Simple creative writing
+- General instruction following in Swahili
+- Resource-constrained environments
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+# Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained("alfaxadeyembe/gemma2-2b-swahili-it")
+model = AutoModelForCausalLM.from_pretrained(
+    "alfaxadeyembe/gemma2-2b-swahili-it",
+    device_map="auto",
+    torch_dtype=torch.bfloat16
+)
+# Always set to eval mode for inference
+model.eval()
+# Example usage
+prompt = "Eleza dhana ya uchumi wa kidijitali na umuhimu wake katika ulimwengu wa leo."
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=500,
+        do_sample=True,
+        temperature=0.7,
+        top_p=0.95
+    )
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+## Training Details
+- **Fine-tuning Method:** LoRA
+- **Training Steps:** 400
+- **Batch Size:** 2
+- **Gradient Accumulation Steps:** 32
+- **Learning Rate:** 2e-4
+- **Training Time:** ~8 hours on A100 GPU
+## Key Features
+- Lightweight and efficient (2B parameters)
+- Suitable for resource-constrained environments
+- Good performance on basic language tasks
+- Fast inference speed
+- Low memory footprint
+## Advantages
+1. Resource Efficiency:
+   - Small model size (2B parameters)
+   - Lower memory requirements
+   - Faster inference time
+   - Suitable for deployment on less powerful hardware
+2. Task Performance:
+   - Strong sentiment analysis capabilities
+   - Decent MMLU performance
+   - Good instruction following
+   - Natural Swahili generation
+## Limitations
+- Simpler responses compared to 9B/27B variants
+## Citation
+```bibtex
+@misc{gemma2-2b-swahili-it,
+  author = {Alfaxad Eyembe},
+  title = {Gemma2-2B-Swahili-IT: A Lightweight Swahili Variant of Gemma2-2B-IT},
+  year = {2025},
+  publisher = {Hugging Face},
+  journal = {Hugging Face Model Hub},
+}
+```
+## Contact
+For questions or feedback, please reach out through:
+- HuggingFace: [@alfaxadeyembe](https://huggingface.co/alfaxad)
+- Twitter: [@alfxad](https://twitter.com/alfxad)