Deci
/

DeciLM-7B-instruct

@@ -38,6 +38,15 @@ DeciLM-7B-instruct is a derivative of the recently released [DeciLM-7B](https://
 - **Finetuning Notebook:** [DeciLM-7B Finetuning Notebook](https://colab.research.google.com/drive/1kEV6i96AQ94xTCvSd11TxkEaksTb5o3U?usp=sharing)
 - **Text Generation Notebook:** [DeciLM-7B-instruct Text Generation Notebook](https://bit.ly/declm-7b-instruct)
 ## Uses
 The model is intended for commercial and research use in English.
@@ -54,16 +63,21 @@ model_name = "Deci/DeciLM-7B-instruct"
 device = "cuda" # for GPU usage or "cpu" for CPU usage
-bnb_config = BitsAndBytesConfig(
-    load_in_4bit = True,
-    bnb_4bit_compute_dtype=torch.bfloat16
-)
 model = AutoModelForCausalLM.from_pretrained(
     model_name,
     device_map="auto",
     trust_remote_code=True,
-    quantization_config=bnb_config
 )
 tokenizer = AutoTokenizer.from_pretrained(model_name)
@@ -75,25 +89,19 @@ deci_generator = pipeline("text-generation",
                           temperature=0.1,
                           device_map="auto",
                           max_length=4096,
-                          return_full_text=False
-)
-prompt = "How do I make the most delicious pancakes the world has ever tasted?"
-SYSTEM_PROMPT_TEMPLATE ="""
-### System:
-You are an AI assistant that follows instruction extremely well. Help as much as you can.
-### User:
-{instruction}
-### Assistant:
-"""
-# Function to construct the prompt using the new system prompt template
-def get_prompt_with_template(message: str) -> str:
-    return SYSTEM_PROMPT_TEMPLATE.format(instruction=message)
-response = deci_generator(get_prompt_with_template(prompt))[0]['generated_text']
-print(response)
 ```
 ## Evaluation

 - **Finetuning Notebook:** [DeciLM-7B Finetuning Notebook](https://colab.research.google.com/drive/1kEV6i96AQ94xTCvSd11TxkEaksTb5o3U?usp=sharing)
 - **Text Generation Notebook:** [DeciLM-7B-instruct Text Generation Notebook](https://bit.ly/declm-7b-instruct)
+### Prompt Template
+```
+### System:
+{system_prompt}
+### User:
+{user_prompt}
+### Assistant:
+```
 ## Uses
 The model is intended for commercial and research use in English.
 device = "cuda" # for GPU usage or "cpu" for CPU usage
+quantize = False  # Optional. Useful for GPUs with less than 24GB memory
+if quantize:
+    dtype_kwargs = dict(quantization_config=BitsAndBytesConfig(
+        load_in_4bit=True,
+        bnb_4bit_compute_dtype=torch.bfloat16
+    ))
+else:
+    dtype_kwargs = dict(torch_dtype="auto")
 model = AutoModelForCausalLM.from_pretrained(
     model_name,
     device_map="auto",
     trust_remote_code=True,
+    **dtype_kwargs
 )
 tokenizer = AutoTokenizer.from_pretrained(model_name)
                           temperature=0.1,
                           device_map="auto",
                           max_length=4096,
+                          return_full_text=False)
+system_prompt = "You are an AI assistant that follows instruction extremely well. Help as much as you can."
+user_prompt = "How do I make the most delicious pancakes the world has ever tasted?"
+prompt = tokenizer.apply_chat_template([
+    {"role": "system", "content": system_prompt},
+    {"role": "user", "content": user_prompt},
+], tokenize=False, add_generation_prompt=True)
+response = deci_generator(prompt)[0]['generated_text']
+print(prompt + response)
 ```
 ## Evaluation