thesven
/

Mistral-7B-Instruct-v0.3-GPTQ

@@ -7,65 +7,41 @@ license: apache-2.0
 ## Quantization Description
 This repo contains a GPTQ 4 bit quantized version of the Mistral-7B-Instruct-v0.3 Large Language Model.
-## Model Description
-The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
-Mistral-7B-v0.3 has the following changes compared to [Mistral-7B-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/edit/main/README.md)
-- Extended vocabulary to 32768
-- Supports v3 Tokenizer
-- Supports function calling
-## Generate with `transformers`
-If you want to use Hugging Face `transformers` to generate text, you can do something like this.
-```py
-import torch
-from transformers import AutoTokenizer, AutoModelForCausalLM
-pretrained_model_name = "thesven/Mistral-7B-Instruct-v0.3-GPTQ"
-device = "cuda:0"
-# Load the tokenizer
-tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name)
-# Load the model with the specified configuration and move to device
-model = AutoModelForCausalLM.from_pretrained(
-    pretrained_model_name,
-    device_map="auto",
-)
-print(model)
-# Set EOS token ID
-model.eos_token_id = tokenizer.eos_token_id
-# Move model to the specified device
-model.to(device)
-# Define the input text
-input_text = "What is PEFT finetuning?"
-# Encode the input text
-input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)
-# Generate output
-output = model.generate(input_ids, max_length=1000)
-# Decode the generated output
-decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)
-# Print the decoded output
-for i, sequence in enumerate(decoded_output):
-    print(f"Generated Sequence {i+1}: {sequence}")
-del model
-torch.cuda.empty_cache()
-```
 ## Limitations

 ## Quantization Description
 This repo contains a GPTQ 4 bit quantized version of the Mistral-7B-Instruct-v0.3 Large Language Model.
+### Using the GPTQ Model
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
+model_name_or_path = "thesven/Mistral-7B-Instruct-v0.3-GPTQ"
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
+model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
+                                             device_map="auto",
+                                             trust_remote_code=False,
+                                             revision="main")
+model.pad_token = model.config.eos_token_id
+prompt_template=f'''
+<s><<SYS>>You are a very creative story writer. Write a store on the following topic:</s><</SYS>>
+<s>[INST]Write a story about Ai</s>[/INST]
+<s>[ASSISTANT]
+'''
+input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
+output = model.generate(inputs=input_ids, temperature=0.1, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
+print(tokenizer.decode(output[0]))
+```
+## Model Description
+The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
+Mistral-7B-v0.3 has the following changes compared to [Mistral-7B-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/edit/main/README.md)
+- Extended vocabulary to 32768
+- Supports v3 Tokenizer
+- Supports function calling
 ## Limitations