cognitivess commited on
Commit
7d3a4bf
·
verified ·
1 Parent(s): a4e799a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md CHANGED
@@ -153,5 +153,83 @@ print(tokenizer.decode(response, skip_special_tokens=True))
153
 
154
  ```
155
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
  **Contact:**
157
  <a href="mailto:hello@cognitivess.com">hello@cognitivess.com</a>
 
153
 
154
  ```
155
 
156
+ ## Usage with LORA + Quantized Versions through bitsandbytes
157
+
158
+ To use this model, first install the custom package:
159
+
160
+ ```bash
161
+ # Install required packages
162
+ !pip install git+https://huggingface.co/CognitivessAI/cognitivess
163
+ !pip install bitsandbytes
164
+ !pip install peft
165
+
166
+ ```
167
+
168
+ Then, you can use the model like this:
169
+ ```python
170
+ import cognitivess_model # Ensure this imports the custom model package
171
+
172
+ from transformers import AutoTokenizer, AutoModelForCausalLM
173
+ from peft import PeftModel, get_peft_config, LoraConfig
174
+ import torch
175
+
176
+ model_id = "CognitivessAI/cognitivess"
177
+
178
+ # Load the tokenizer
179
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
180
+
181
+ # Define the quantization configuration
182
+ quantization_config = {
183
+ "load_in_8bit": True,
184
+ "llm_int8_threshold": 6.0
185
+ }
186
+
187
+ # Load the model with 8-bit quantization
188
+ model = AutoModelForCausalLM.from_pretrained(
189
+ model_id,
190
+ torch_dtype=torch.float32,
191
+ device_map="auto",
192
+ **quantization_config
193
+ )
194
+
195
+ # Define the fine-tuning configuration
196
+ fine_tuning_config = LoraConfig(
197
+ r=8,
198
+ lora_alpha=16,
199
+ lora_dropout=0.1,
200
+ target_modules=["q_proj", "v_proj"]
201
+ )
202
+
203
+ # Apply parameter-efficient fine-tuning (PEFT) using QLoRA
204
+ model = PeftModel(model, fine_tuning_config)
205
+
206
+ # Prepare the messages
207
+ messages = [
208
+ {"role": "user", "content": "Explain how large language models work in detail."},
209
+ ]
210
+
211
+ # Tokenize the input
212
+ input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
213
+
214
+ # Define the inference parameters
215
+ inference_params = {
216
+ "max_new_tokens": 8192,
217
+ "temperature": 0.7,
218
+ "top_p": 0.95,
219
+ "do_sample": True
220
+ }
221
+
222
+ # Generate the response
223
+ outputs = model.generate(
224
+ input_ids,
225
+ **inference_params
226
+ )
227
+
228
+ # Decode and print the response
229
+ response = outputs[0][input_ids.shape[-1]:]
230
+ print(tokenizer.decode(response, skip_special_tokens=True))
231
+
232
+ ```
233
+
234
  **Contact:**
235
  <a href="mailto:hello@cognitivess.com">hello@cognitivess.com</a>