alokabhishek commited on
Commit
90ab5a7
1 Parent(s): 182c208

Created Readme

Browse files
Files changed (1) hide show
  1. README.md +129 -0
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - 4bit
7
+ - AWQ
8
+ - AutoAWQ
9
+ - 7b
10
+ - quantized
11
+ - Mistral
12
+ - Mistral-7B
13
+ ---
14
+
15
+
16
+ # Model Card for alokabhishek/Mistral-7B-Instruct-v0.2-4bit-AWQ
17
+
18
+ <!-- Provide a quick summary of what the model is/does. -->
19
+
20
+ This repo contains 4-bit quantized (using AutoAWQ) model of Mistral AI_'s Mistral-7B-Instruct-v0.2
21
+
22
+ AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration is developed by MIT-HAN-Lab
23
+
24
+
25
+ ## Model Details
26
+
27
+ - Model creator: [Mistral AI_](https://huggingface.co/mistralai)
28
+ - Original model: [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
29
+
30
+
31
+ ### About 4 bit quantization using AutoAWQ
32
+
33
+ - AutoAWQ github repo: [AutoAWQ github repo](https://github.com/casper-hansen/AutoAWQ/tree/main)
34
+ - MIT-han-lab llm-aws github repo: [MIT-han-lab llm-aws github repo](https://github.com/mit-han-lab/llm-awq/tree/main)
35
+
36
+ @inproceedings{lin2023awq,
37
+ title={AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration},
38
+ author={Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Chen, Wei-Ming and Wang, Wei-Chen and Xiao, Guangxuan and Dang, Xingyu and Gan, Chuang and Han, Song},
39
+ booktitle={MLSys},
40
+ year={2024}
41
+ }
42
+
43
+
44
+ # How to Get Started with the Model
45
+
46
+ Use the code below to get started with the model.
47
+
48
+ ## How to run from Python code
49
+
50
+ #### First install the package
51
+ ```shell
52
+ !pip install autoawq
53
+ !pip install accelerate
54
+ ```
55
+
56
+ #### Import
57
+
58
+ ```python
59
+ import torch
60
+ import os
61
+ from torch import bfloat16
62
+ from huggingface_hub import login, HfApi, create_repo
63
+ from transformers import AutoTokenizer, pipeline
64
+ from awq import AutoAWQForCausalLM
65
+ ```
66
+
67
+ #### Use a pipeline as a high-level helper
68
+
69
+ ```python
70
+ # define the model ID
71
+ model_id_llama = "alokabhishek/Mistral-7B-Instruct-v0.2-4bit-AWQ"
72
+
73
+ # Load model
74
+ tokenizer_llama = AutoTokenizer.from_pretrained(model_id_llama, use_fast=True)
75
+ model_llama = AutoAWQForCausalLM.from_quantized(model_id_llama, fuse_layer=True, trust_remote_code = False, safetensors = True)
76
+
77
+ # Set up the prompt and prompt template. Change instruction as per requirements.
78
+ prompt_llama = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
79
+ fromatted_prompt = f'''[INST] <<SYS>> You are a helpful, and fun loving assistant. Always answer as jestfully as possible. <</SYS>> {prompt_llama} [/INST] '''
80
+
81
+ tokens = tokenizer_llama(fromatted_prompt, return_tensors="pt").input_ids.cuda()
82
+
83
+ # Generate output, adjust parameters as per requirements
84
+ generation_output = model_llama.generate(tokens, do_sample=True, temperature=1.7, top_p=0.95, top_k=40, max_new_tokens=512)
85
+
86
+ # Print the output
87
+ print(tokenizer_llama.decode(generation_output[0], skip_special_tokens=True))
88
+
89
+
90
+ ```
91
+
92
+
93
+ ## Uses
94
+
95
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
96
+
97
+ ### Direct Use
98
+
99
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ### Downstream Use [optional]
104
+
105
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
106
+
107
+ [More Information Needed]
108
+
109
+ ### Out-of-Scope Use
110
+
111
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
112
+
113
+ [More Information Needed]
114
+
115
+ ## Bias, Risks, and Limitations
116
+
117
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
118
+
119
+ [More Information Needed]
120
+
121
+
122
+
123
+ ## Model Card Authors [optional]
124
+
125
+ [More Information Needed]
126
+
127
+ ## Model Card Contact
128
+
129
+ [More Information Needed]