disi-unibo-nlp
/

pmc-llama-13b-awq

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

alecocc commited on Feb 23

Commit

5ac00d7

•

1 Parent(s): 1a30347

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -15,9 +15,8 @@ This repo contains AWQ model files for [PMC_LLaMA_13B](https://huggingface.co/ax
 AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
-- When using vLLM from Python code, again set `quantization=awq`.
-For example:
 ```python
 from vllm import LLM, SamplingParams

 AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
+Example of usage with vLLM library:
 ```python
 from vllm import LLM, SamplingParams