disi-unibo-nlp
/

pmc-llama-13b-awq

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

alecocc commited on Feb 23

Commit

3e359ce

•

1 Parent(s): 9227166

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ This repo contains AWQ model files for [PMC_LLaMA_13B](https://huggingface.co/ax
 ### About AWQ
-[AWQ](https://arxiv.org/abs/2306.00978) is a rapid, precise, and efficient low-bit weight quantization method, enabling 4-bit quantization with remarkable speed.
 Example of usage with vLLM library:

 ### About AWQ
+[Activation-aware Weight Quantization (AWQ)](https://arxiv.org/abs/2306.00978) selectively preserves a subset of crucial weights for LLM performance instead of quantizing all weights in a model. This targeted approach minimizes quantization loss, allowing models to operate in 4-bit precision without compromising performance.
 Example of usage with vLLM library: