Update README.md
Browse files
README.md
CHANGED
@@ -7,65 +7,41 @@ license: apache-2.0
|
|
7 |
## Quantization Description
|
8 |
This repo contains a GPTQ 4 bit quantized version of the Mistral-7B-Instruct-v0.3 Large Language Model.
|
9 |
|
10 |
-
|
11 |
-
|
12 |
-
The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
|
13 |
-
|
14 |
-
Mistral-7B-v0.3 has the following changes compared to [Mistral-7B-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/edit/main/README.md)
|
15 |
-
- Extended vocabulary to 32768
|
16 |
-
- Supports v3 Tokenizer
|
17 |
-
- Supports function calling
|
18 |
-
|
19 |
-
|
20 |
-
## Generate with `transformers`
|
21 |
-
|
22 |
-
If you want to use Hugging Face `transformers` to generate text, you can do something like this.
|
23 |
-
|
24 |
-
```py
|
25 |
-
import torch
|
26 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
27 |
|
28 |
-
|
29 |
-
|
30 |
|
31 |
-
|
32 |
-
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name)
|
33 |
|
34 |
-
|
35 |
-
model = AutoModelForCausalLM.from_pretrained(
|
36 |
-
|
37 |
-
|
38 |
-
)
|
|
|
39 |
|
40 |
-
print(model)
|
41 |
|
42 |
-
|
43 |
-
|
|
|
|
|
|
|
44 |
|
45 |
-
|
46 |
-
model.
|
|
|
47 |
|
48 |
-
|
49 |
-
input_text = "What is PEFT finetuning?"
|
50 |
-
|
51 |
-
# Encode the input text
|
52 |
-
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)
|
53 |
-
|
54 |
-
# Generate output
|
55 |
-
output = model.generate(input_ids, max_length=1000)
|
56 |
-
|
57 |
-
# Decode the generated output
|
58 |
-
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)
|
59 |
-
|
60 |
-
# Print the decoded output
|
61 |
-
for i, sequence in enumerate(decoded_output):
|
62 |
-
print(f"Generated Sequence {i+1}: {sequence}")
|
63 |
|
64 |
-
|
65 |
-
torch.cuda.empty_cache()
|
66 |
|
|
|
67 |
|
68 |
-
|
|
|
|
|
|
|
69 |
|
70 |
## Limitations
|
71 |
|
|
|
7 |
## Quantization Description
|
8 |
This repo contains a GPTQ 4 bit quantized version of the Mistral-7B-Instruct-v0.3 Large Language Model.
|
9 |
|
10 |
+
### Using the GPTQ Model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
+
```python
|
13 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
|
14 |
|
15 |
+
model_name_or_path = "thesven/Mistral-7B-Instruct-v0.3-GPTQ"
|
|
|
16 |
|
17 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
|
18 |
+
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
|
19 |
+
device_map="auto",
|
20 |
+
trust_remote_code=False,
|
21 |
+
revision="main")
|
22 |
+
model.pad_token = model.config.eos_token_id
|
23 |
|
|
|
24 |
|
25 |
+
prompt_template=f'''
|
26 |
+
<s><<SYS>>You are a very creative story writer. Write a store on the following topic:</s><</SYS>>
|
27 |
+
<s>[INST]Write a story about Ai</s>[/INST]
|
28 |
+
<s>[ASSISTANT]
|
29 |
+
'''
|
30 |
|
31 |
+
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
|
32 |
+
output = model.generate(inputs=input_ids, temperature=0.1, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
|
33 |
+
print(tokenizer.decode(output[0]))
|
34 |
|
35 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
+
## Model Description
|
|
|
38 |
|
39 |
+
The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
|
40 |
|
41 |
+
Mistral-7B-v0.3 has the following changes compared to [Mistral-7B-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/edit/main/README.md)
|
42 |
+
- Extended vocabulary to 32768
|
43 |
+
- Supports v3 Tokenizer
|
44 |
+
- Supports function calling
|
45 |
|
46 |
## Limitations
|
47 |
|