File size: 3,882 Bytes
d24660c
 
 
 
 
 
 
 
 
 
 
 
 
3403bfc
 
 
 
 
 
 
 
 
d24660c
 
 
3403bfc
 
 
d24660c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
039c1c1
 
 
d24660c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
license: apache-2.0
language:
- sw
base_model:
- google/gemma-2-2b-it
pipeline_tag: text-generation
library_name: transformers
tags:
- swahili
- gemma2
- text-generation-inference
- text-generation
inference:
  parameters:
    temperature: 0.7
    top_p: 0.95
    max_new_tokens: 500
    do_sample: true
    eval_mode: true
    model_kwargs:
      eval_mode: true
---





# Gemma2-2B-Swahili-IT

Gemma2-2B-Swahili-IT is a lightweight, efficient open variant of Google's Gemma2-2B-IT model, fine-tuned for natural Swahili language understanding and generation. This model provides a resource-efficient option for Swahili language tasks while maintaining strong performance.

## Model Details

- **Developer:** Alfaxad Eyembe
- **Base Model:** google/gemma-2-2b-it
- **Model Type:** Decoder-only transformer
- **Language(s):** Swahili
- **License:** Apache 2.0
- **Finetuning Approach:** Low-Rank Adaptation (LoRA)

## Training Data

The model was fine-tuned on a comprehensive dataset containing:
- 67,017 instruction-response pairs
- 16,273,709 total tokens
- Average 242.83 tokens per example
- High-quality, naturally-written Swahili content


![image/png](https://cdn-uploads.huggingface.co/production/uploads/6375af60e3413701a9f01c0f/7XXsvi8_x5PXZwXcUD-kl.png)

## Performance

### Massive Multitask Language Understanding (MMLU) - Swahili
- Base Model: 31.58% accuracy
- Fine-tuned Model: 38.60% accuracy
- Improvement: +7.02%

### Sentiment Analysis
- Base Model: 84.85% accuracy
- Fine-tuned Model: 86.00% accuracy
- Improvement: +1.15%
- Response Validity: 100%

## Intended Use

This model is designed for:
- Basic Swahili text generation
- Question answering
- Sentiment analysis
- Simple creative writing
- General instruction following in Swahili
- Resource-constrained environments

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("alfaxadeyembe/gemma2-2b-swahili-it")
model = AutoModelForCausalLM.from_pretrained(
    "alfaxadeyembe/gemma2-2b-swahili-it",
    device_map="auto",
    torch_dtype=torch.bfloat16
)

# Always set to eval mode for inference
model.eval()

# Example usage
prompt = "Eleza dhana ya uchumi wa kidijitali na umuhimu wake katika ulimwengu wa leo."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=500,
        do_sample=True,
        temperature=0.7,
        top_p=0.95
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Training Details

- **Fine-tuning Method:** LoRA
- **Training Steps:** 400
- **Batch Size:** 2
- **Gradient Accumulation Steps:** 32
- **Learning Rate:** 2e-4
- **Training Time:** ~8 hours on A100 GPU

## Key Features

- Lightweight and efficient (2B parameters)
- Suitable for resource-constrained environments
- Good performance on basic language tasks
- Fast inference speed
- Low memory footprint

## Advantages

1. Resource Efficiency:
   - Small model size (2B parameters)
   - Lower memory requirements
   - Faster inference time
   - Suitable for deployment on less powerful hardware

2. Task Performance:
   - Strong sentiment analysis capabilities
   - Decent MMLU performance
   - Good instruction following
   - Natural Swahili generation

## Limitations

- Simpler responses compared to 9B/27B variants

## Citation

```bibtex
@misc{gemma2-2b-swahili-it,
  author = {Alfaxad Eyembe},
  title = {Gemma2-2B-Swahili-IT: A Lightweight Swahili Variant of Gemma2-2B-IT},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
}
```

## Contact

For questions or feedback, please reach out through:
- HuggingFace: [@alfaxadeyembe](https://huggingface.co/alfaxad)
- Twitter: [@alfxad](https://twitter.com/alfxad)