Suparious commited on
Commit
c4a7ba7
1 Parent(s): 2fbac2c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -1
README.md CHANGED
@@ -176,4 +176,87 @@ prompt_template: '<|im_start|>system
176
 
177
  '
178
  ---
179
- # Weyaxi/Einstein-v5-v0.2-7B AWQ
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
176
 
177
  '
178
  ---
179
+ # Weyaxi/Einstein-v5-v0.2-7B AWQ
180
+
181
+ - Model creator: [Weyaxi](https://huggingface.co/Weyaxi)
182
+ - Original model: [Einstein-v5-v0.2-7B](https://huggingface.co/Weyaxi/Einstein-v5-v0.2-7B)
183
+
184
+ ## Model Summary
185
+
186
+ This model is a full fine-tuned version of [alpindale/Mistral-7B-v0.2-hf](https://huggingface.co/alpindale/Mistral-7B-v0.2-hf) on diverse datasets.
187
+
188
+ This model is finetuned using `8xRTX3090` + `1xRTXA6000` using [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl).
189
+
190
+ This model's training was sponsored by [sablo.ai](https://sablo.ai).
191
+
192
+ ## How to use
193
+
194
+ ### Install the necessary packages
195
+
196
+ ```bash
197
+ pip install --upgrade autoawq autoawq-kernels
198
+ ```
199
+
200
+ ### Example Python code
201
+
202
+ ```python
203
+ from awq import AutoAWQForCausalLM
204
+ from transformers import AutoTokenizer, TextStreamer
205
+
206
+ model_path = "solidrust/Einstein-v5-v0.2-7B-AWQ"
207
+ system_message = "You are Alpert Einstein, incarnated a powerful AI."
208
+
209
+ # Load model
210
+ model = AutoAWQForCausalLM.from_quantized(model_path,
211
+ fuse_layers=True)
212
+ tokenizer = AutoTokenizer.from_pretrained(model_path,
213
+ trust_remote_code=True)
214
+ streamer = TextStreamer(tokenizer,
215
+ skip_prompt=True,
216
+ skip_special_tokens=True)
217
+
218
+ # Convert prompt to tokens
219
+ prompt_template = """\
220
+ <|im_start|>system
221
+ {system_message}<|im_end|>
222
+ <|im_start|>user
223
+ {prompt}<|im_end|>
224
+ <|im_start|>assistant"""
225
+
226
+ prompt = "You're standing on the surface of the Earth. "\
227
+ "You walk one mile south, one mile west and one mile north. "\
228
+ "You end up exactly where you started. Where are you?"
229
+
230
+ tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
231
+ return_tensors='pt').input_ids.cuda()
232
+
233
+ # Generate output
234
+ generation_output = model.generate(tokens,
235
+ streamer=streamer,
236
+ max_new_tokens=512)
237
+
238
+ ```
239
+
240
+ ### About AWQ
241
+
242
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
243
+
244
+ AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
245
+
246
+ It is supported by:
247
+
248
+ - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
249
+ - [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.
250
+ - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
251
+ - [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
252
+ - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
253
+
254
+ ## Prompt template: ChatML
255
+
256
+ ```plaintext
257
+ <|im_start|>system
258
+ {system_message}<|im_end|>
259
+ <|im_start|>user
260
+ {prompt}<|im_end|>
261
+ <|im_start|>assistant
262
+ ```