Update README.md
Browse files
README.md
CHANGED
@@ -15,9 +15,8 @@ This repo contains AWQ model files for [PMC_LLaMA_13B](https://huggingface.co/ax
|
|
15 |
|
16 |
AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
|
17 |
|
18 |
-
- When using vLLM from Python code, again set `quantization=awq`.
|
19 |
|
20 |
-
|
21 |
|
22 |
```python
|
23 |
from vllm import LLM, SamplingParams
|
|
|
15 |
|
16 |
AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
|
17 |
|
|
|
18 |
|
19 |
+
Example of usage with vLLM library:
|
20 |
|
21 |
```python
|
22 |
from vllm import LLM, SamplingParams
|