saikanov commited on
Commit
f318f2a
1 Parent(s): bb6c07b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +141 -3
README.md CHANGED
@@ -1,3 +1,141 @@
1
- ---
2
- license: gemma
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ ---
4
+ # Gemma-2-9B-CPT-SahabatAI-Instruct GGUF
5
+
6
+ This is a GGUF quantized version of Gemma 2 9B, fine-tuned with custom instructions by SahabatAI and optimized for CPU inference using Q4_K_M quantization.
7
+
8
+ ## Model Details
9
+
10
+ - **Base Model**: Gemma 2 9B
11
+ - **Instruction Format**: SahabatAI Instruct v1
12
+ - **Quantization**: GGUF Q4_K_M (4-bit with Medium precision for Key/Value cache)
13
+ - **Original Size**: 9B parameters
14
+ - **Quantized Size**: ~5GB
15
+ - **Context Length**: 8192 tokens
16
+ - **License**: Gemma Terms of Use
17
+
18
+ ## Description
19
+
20
+ This model is a quantized version of Gemma 2 9B, fine-tuned with custom instruction format by SahabatAI. The Q4_K_M quantization provides a good balance between model size, speed, and quality. The instruction format is optimized for general-purpose tasks while maintaining model coherence and reliability.
21
+
22
+ ## Usage
23
+
24
+ ### oobabooga's text-generation-webui Setup
25
+
26
+ 1. **Install text-generation-webui**:
27
+ ```bash
28
+ git clone https://github.com/oobabooga/text-generation-webui
29
+ cd text-generation-webui
30
+ pip install -r requirements.txt
31
+ ```
32
+
33
+ 2. **Download Model**:
34
+ ```bash
35
+ mkdir models
36
+ cd models
37
+ # Download gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf from Hugging Face
38
+ ```
39
+
40
+ 3. **Launch the Web UI**:
41
+ ```bash
42
+ python server.py --model gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf
43
+ ```
44
+
45
+ ### Recommended Launch Parameters
46
+
47
+ For optimal performance on different hardware:
48
+
49
+ **CPU Only**:
50
+ ```bash
51
+ python server.py --model gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf --cpu --n_ctx 8192
52
+ ```
53
+
54
+ **GPU (CUDA)**:
55
+ ```bash
56
+ python server.py --model gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf --n_ctx 8192 --gpu-memory 6
57
+ ```
58
+
59
+ ### Recommended Generation Parameters
60
+
61
+ ```yaml
62
+ temperature: 0.7
63
+ top_p: 0.9
64
+ top_k: 40
65
+ repetition_penalty: 1.1
66
+ max_new_tokens: 2048
67
+ ```
68
+
69
+ ### Instruction Format
70
+
71
+ The model responds best to this instruction format:
72
+ ```
73
+ <|system|>You are a helpful AI assistant.</|system|>
74
+
75
+ <|user|>Your question here</|user|>
76
+
77
+ <|assistant|>
78
+ ```
79
+
80
+ ## Performance Benchmarks
81
+
82
+ | Device | Tokens/sec | Memory Usage |
83
+ |-----------------------|------------|--------------|
84
+ | CPU (8 cores) | ~15 t/s | 6GB |
85
+ | NVIDIA RTX 3060 (6GB) | ~40 t/s | 5GB |
86
+ | NVIDIA RTX 4090 | ~100 t/s | 5GB |
87
+
88
+ ## Example Outputs
89
+
90
+ ```
91
+ <|system|>You are a helpful AI assistant.</|system|>
92
+
93
+ <|user|>What is the capital of Indonesia?</|user|>
94
+
95
+ <|assistant|>Jakarta is the capital city of Indonesia. It is located on the northwestern coast of Java, the most populous island in Indonesia. Jakarta serves as the country's economic, cultural, and political center.
96
+
97
+ <|user|>Write a simple Python function to calculate factorial.</|user|>
98
+
99
+ <|assistant|>Here's a simple recursive function to calculate factorial:
100
+
101
+ def factorial(n):
102
+ if n == 0 or n == 1:
103
+ return 1
104
+ return n * factorial(n-1)
105
+ ```
106
+
107
+ ## Known Limitations
108
+
109
+ - Requires minimum 6GB RAM for CPU inference
110
+ - Best performance with GPU having 6GB+ VRAM
111
+ - May show degraded performance on very long contexts (>4096 tokens)
112
+ - Quantization may impact some mathematical and logical reasoning tasks
113
+
114
+ ## Fine-tuning Details
115
+
116
+ - Base Model: Gemma 2 9B
117
+ - Instruction Format: Custom SahabatAI format
118
+ - Quantization: Q4_K_M using llama.cpp
119
+
120
+ ## License
121
+
122
+ This model is subject to the Gemma Terms of Use. Please refer to Google's Gemma licensing terms for commercial usage.
123
+
124
+ ## Acknowledgments
125
+
126
+ - Google for the Gemma 2 base model
127
+ - SahabatAI for instruction fine-tuning
128
+ - TheBloke for GGUF conversion tools
129
+ - oobabooga for text-generation-webui
130
+
131
+ ## Support
132
+
133
+ For issues and questions:
134
+ - Open an issue in this repository
135
+ - Visit our Discord: [Your Discord Link]
136
+ - Email: [Your Support Email]
137
+
138
+ ## Updates & Versions
139
+
140
+ - v1.0 (2024-03): Initial release with Q4_K_M quantization
141
+ - Future updates will be listed here