alecocc commited on
Commit
5ac00d7
1 Parent(s): 1a30347

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -15,9 +15,8 @@ This repo contains AWQ model files for [PMC_LLaMA_13B](https://huggingface.co/ax
15
 
16
  AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
17
 
18
- - When using vLLM from Python code, again set `quantization=awq`.
19
 
20
- For example:
21
 
22
  ```python
23
  from vllm import LLM, SamplingParams
 
15
 
16
  AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
17
 
 
18
 
19
+ Example of usage with vLLM library:
20
 
21
  ```python
22
  from vllm import LLM, SamplingParams