CognitivessAI
/

cognitivess

Text Generation

text-generation-inference

Sentiment Analysis

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

cognitivess commited on Jul 20, 2024

Commit

7d3a4bf

·

verified ·

1 Parent(s): a4e799a

Update README.md

Files changed (1) hide show

README.md +78 -0

README.md CHANGED Viewed

@@ -153,5 +153,83 @@ print(tokenizer.decode(response, skip_special_tokens=True))
 ```
 **Contact:**
 <a href="mailto:hello@cognitivess.com">hello@cognitivess.com</a>

 ```
+## Usage with LORA + Quantized Versions through bitsandbytes
+To use this model, first install the custom package:
+```bash
+# Install required packages
+!pip install git+https://huggingface.co/CognitivessAI/cognitivess
+!pip install bitsandbytes
+!pip install peft
+```
+Then, you can use the model like this:
+```python
+import cognitivess_model  # Ensure this imports the custom model package
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from peft import PeftModel, get_peft_config, LoraConfig
+import torch
+model_id = "CognitivessAI/cognitivess"
+# Load the tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+# Define the quantization configuration
+quantization_config = {
+    "load_in_8bit": True,
+    "llm_int8_threshold": 6.0
+}
+# Load the model with 8-bit quantization
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.float32,
+    device_map="auto",
+    **quantization_config
+)
+# Define the fine-tuning configuration
+fine_tuning_config = LoraConfig(
+    r=8,
+    lora_alpha=16,
+    lora_dropout=0.1,
+    target_modules=["q_proj", "v_proj"]
+)
+# Apply parameter-efficient fine-tuning (PEFT) using QLoRA
+model = PeftModel(model, fine_tuning_config)
+# Prepare the messages
+messages = [
+    {"role": "user", "content": "Explain how large language models work in detail."},
+]
+# Tokenize the input
+input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
+# Define the inference parameters
+inference_params = {
+    "max_new_tokens": 8192,
+    "temperature": 0.7,
+    "top_p": 0.95,
+    "do_sample": True
+}
+# Generate the response
+outputs = model.generate(
+    input_ids,
+    **inference_params
+)
+# Decode and print the response
+response = outputs[0][input_ids.shape[-1]:]
+print(tokenizer.decode(response, skip_special_tokens=True))
+```
 **Contact:**
 <a href="mailto:hello@cognitivess.com">hello@cognitivess.com</a>