metadata
license: openrail
model_creator: axiong
model_name: PMC_LLaMA_13B
PMC_LLaMA_13B - AWQ
- Model creator: axiong
- Original model: PMC_LLaMA_13B
Description
This repository contains AWQ model files for PMC_LLaMA_13B.
About AWQ
Activation-aware Weight Quantization (AWQ) selectively preserves a subset of crucial weights for LLM performance instead of quantizing all weights in a model. This targeted approach minimizes quantization loss, allowing models to operate in 4-bit precision without compromising performance.
Example of usage with vLLM library:
from vllm import LLM, SamplingParams
prompts = [
"What is the mechanism of action of antibiotics?",
"How do statins work to lower cholesterol levels?",
"Tell me about Paracetamol"
]
sampling_params = SamplingParams(temperature=0.8)
llm = LLM(model="disi-unibo-nlp/pmc-llama-13b-awq", quantization="awq", dtype="half")
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt}")
print(f"Response: {generated_text}")