disi-unibo-nlp
/

pmc-llama-13b-awq

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

alecocc commited on Feb 23

Commit

9227166

•

1 Parent(s): 3ea4789

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -13,8 +13,7 @@ This repo contains AWQ model files for [PMC_LLaMA_13B](https://huggingface.co/ax
 ### About AWQ
-AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
 Example of usage with vLLM library:

 ### About AWQ
+[AWQ](https://arxiv.org/abs/2306.00978) is a rapid, precise, and efficient low-bit weight quantization method, enabling 4-bit quantization with remarkable speed.
 Example of usage with vLLM library: