thesven commited on
Commit
fd51577
1 Parent(s): 20b81fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -49
README.md CHANGED
@@ -7,65 +7,41 @@ license: apache-2.0
7
  ## Quantization Description
8
  This repo contains a GPTQ 4 bit quantized version of the Mistral-7B-Instruct-v0.3 Large Language Model.
9
 
10
- ## Model Description
11
-
12
- The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
13
-
14
- Mistral-7B-v0.3 has the following changes compared to [Mistral-7B-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/edit/main/README.md)
15
- - Extended vocabulary to 32768
16
- - Supports v3 Tokenizer
17
- - Supports function calling
18
-
19
-
20
- ## Generate with `transformers`
21
-
22
- If you want to use Hugging Face `transformers` to generate text, you can do something like this.
23
-
24
- ```py
25
- import torch
26
- from transformers import AutoTokenizer, AutoModelForCausalLM
27
 
28
- pretrained_model_name = "thesven/Mistral-7B-Instruct-v0.3-GPTQ"
29
- device = "cuda:0"
30
 
31
- # Load the tokenizer
32
- tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name)
33
 
34
- # Load the model with the specified configuration and move to device
35
- model = AutoModelForCausalLM.from_pretrained(
36
- pretrained_model_name,
37
- device_map="auto",
38
- )
 
39
 
40
- print(model)
41
 
42
- # Set EOS token ID
43
- model.eos_token_id = tokenizer.eos_token_id
 
 
 
44
 
45
- # Move model to the specified device
46
- model.to(device)
 
47
 
48
- # Define the input text
49
- input_text = "What is PEFT finetuning?"
50
-
51
- # Encode the input text
52
- input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)
53
-
54
- # Generate output
55
- output = model.generate(input_ids, max_length=1000)
56
-
57
- # Decode the generated output
58
- decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)
59
-
60
- # Print the decoded output
61
- for i, sequence in enumerate(decoded_output):
62
- print(f"Generated Sequence {i+1}: {sequence}")
63
 
64
- del model
65
- torch.cuda.empty_cache()
66
 
 
67
 
68
- ```
 
 
 
69
 
70
  ## Limitations
71
 
 
7
  ## Quantization Description
8
  This repo contains a GPTQ 4 bit quantized version of the Mistral-7B-Instruct-v0.3 Large Language Model.
9
 
10
+ ### Using the GPTQ Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
+ ```python
13
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
14
 
15
+ model_name_or_path = "thesven/Mistral-7B-Instruct-v0.3-GPTQ"
 
16
 
17
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
18
+ model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
19
+ device_map="auto",
20
+ trust_remote_code=False,
21
+ revision="main")
22
+ model.pad_token = model.config.eos_token_id
23
 
 
24
 
25
+ prompt_template=f'''
26
+ <s><<SYS>>You are a very creative story writer. Write a store on the following topic:</s><</SYS>>
27
+ <s>[INST]Write a story about Ai</s>[/INST]
28
+ <s>[ASSISTANT]
29
+ '''
30
 
31
+ input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
32
+ output = model.generate(inputs=input_ids, temperature=0.1, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
33
+ print(tokenizer.decode(output[0]))
34
 
35
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
+ ## Model Description
 
38
 
39
+ The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
40
 
41
+ Mistral-7B-v0.3 has the following changes compared to [Mistral-7B-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/edit/main/README.md)
42
+ - Extended vocabulary to 32768
43
+ - Supports v3 Tokenizer
44
+ - Supports function calling
45
 
46
  ## Limitations
47