Alfaxad commited on
Commit
d24660c
·
verified ·
1 Parent(s): 498afd7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +146 -3
README.md CHANGED
@@ -1,3 +1,146 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - sw
5
+ base_model:
6
+ - google/gemma-2-2b-it
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ tags:
10
+ - swahili
11
+ - gemma2
12
+ - text-generation-inference
13
+ - text-generation
14
+ ---
15
+
16
+
17
+ # Gemma2-2B-Swahili-IT
18
+
19
+ Gemma2-2B-Swahili-IT is a lightweight, efficient open variant of Google's Gemma2-2B-IT model, fine-tuned for natural Swahili language understanding and generation. This model provides a resource-efficient option for Swahili language tasks while maintaining strong performance.
20
+
21
+ ## Model Details
22
+
23
+ - **Developer:** Alfaxad Eyembe
24
+ - **Base Model:** google/gemma-2-2b-it
25
+ - **Model Type:** Decoder-only transformer
26
+ - **Language(s):** Swahili
27
+ - **License:** Apache 2.0
28
+ - **Finetuning Approach:** Low-Rank Adaptation (LoRA)
29
+
30
+ ## Training Data
31
+
32
+ The model was fine-tuned on a comprehensive dataset containing:
33
+ - 67,017 instruction-response pairs
34
+ - 16,273,709 total tokens
35
+ - Average 242.83 tokens per example
36
+ - High-quality, naturally-written Swahili content
37
+
38
+ ## Performance
39
+
40
+ ### Massive Multitask Language Understanding (MMLU) - Swahili
41
+ - Base Model: 31.58% accuracy
42
+ - Fine-tuned Model: 38.60% accuracy
43
+ - Improvement: +7.02%
44
+
45
+ ### Sentiment Analysis
46
+ - Base Model: 84.85% accuracy
47
+ - Fine-tuned Model: 86.00% accuracy
48
+ - Improvement: +1.15%
49
+ - Response Validity: 100%
50
+
51
+ ## Intended Use
52
+
53
+ This model is designed for:
54
+ - Basic Swahili text generation
55
+ - Question answering
56
+ - Sentiment analysis
57
+ - Simple creative writing
58
+ - General instruction following in Swahili
59
+ - Resource-constrained environments
60
+
61
+ ## Usage
62
+
63
+ ```python
64
+ from transformers import AutoTokenizer, AutoModelForCausalLM
65
+ import torch
66
+
67
+ # Load tokenizer and model
68
+ tokenizer = AutoTokenizer.from_pretrained("alfaxadeyembe/gemma2-2b-swahili-it")
69
+ model = AutoModelForCausalLM.from_pretrained(
70
+ "alfaxadeyembe/gemma2-2b-swahili-it",
71
+ device_map="auto",
72
+ torch_dtype=torch.bfloat16
73
+ )
74
+
75
+ # Always set to eval mode for inference
76
+ model.eval()
77
+
78
+ # Example usage
79
+ prompt = "Eleza dhana ya uchumi wa kidijitali na umuhimu wake katika ulimwengu wa leo."
80
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
81
+
82
+ with torch.no_grad():
83
+ outputs = model.generate(
84
+ **inputs,
85
+ max_new_tokens=500,
86
+ do_sample=True,
87
+ temperature=0.7,
88
+ top_p=0.95
89
+ )
90
+
91
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
92
+ print(response)
93
+ ```
94
+
95
+ ## Training Details
96
+
97
+ - **Fine-tuning Method:** LoRA
98
+ - **Training Steps:** 400
99
+ - **Batch Size:** 2
100
+ - **Gradient Accumulation Steps:** 32
101
+ - **Learning Rate:** 2e-4
102
+ - **Training Time:** ~8 hours on A100 GPU
103
+
104
+ ## Key Features
105
+
106
+ - Lightweight and efficient (2B parameters)
107
+ - Suitable for resource-constrained environments
108
+ - Good performance on basic language tasks
109
+ - Fast inference speed
110
+ - Low memory footprint
111
+
112
+ ## Advantages
113
+
114
+ 1. Resource Efficiency:
115
+ - Small model size (2B parameters)
116
+ - Lower memory requirements
117
+ - Faster inference time
118
+ - Suitable for deployment on less powerful hardware
119
+
120
+ 2. Task Performance:
121
+ - Strong sentiment analysis capabilities
122
+ - Decent MMLU performance
123
+ - Good instruction following
124
+ - Natural Swahili generation
125
+
126
+ ## Limitations
127
+
128
+ - Simpler responses compared to 9B/27B variants
129
+
130
+ ## Citation
131
+
132
+ ```bibtex
133
+ @misc{gemma2-2b-swahili-it,
134
+ author = {Alfaxad Eyembe},
135
+ title = {Gemma2-2B-Swahili-IT: A Lightweight Swahili Variant of Gemma2-2B-IT},
136
+ year = {2025},
137
+ publisher = {Hugging Face},
138
+ journal = {Hugging Face Model Hub},
139
+ }
140
+ ```
141
+
142
+ ## Contact
143
+
144
+ For questions or feedback, please reach out through:
145
+ - HuggingFace: [@alfaxadeyembe](https://huggingface.co/alfaxad)
146
+ - Twitter: [@alfxad](https://twitter.com/alfxad)