metadata
license: openrail
model_creator: axiong
model_name: PMC_LLaMA_13B
PMC_LLaMA_13B - AWQ
- Model creator: axiong
- Original model: PMC_LLaMA_13B
Description
This repository contains AWQ model files for PMC_LLaMA_13B.
About AWQ
Activation-aware Weight Quantization (AWQ) selectively preserves a subset of crucial weights for LLM performance instead of quantizing all weights in a model. This targeted approach minimizes quantization loss, allowing models to operate in 4-bit precision without compromising performance.
Example of usage with vLLM library:
from vllm import LLM, SamplingParams
tokenizer = AutoTokenizer.from_pretrained('axiong/PMC_LLaMA_13B')
prompt_input = (
'Below is an instruction that describes a task, paired with an input that provides further context.'
'Write a response that appropriately completes the request.\n\n'
'### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:'
)
example = {
"instruction": "You're a doctor, kindly address the medical queries according to the patient's account. Answer with the best option directly.",
"input": (
"###Question: A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. "
"She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. "
"She otherwise feels well and is followed by a doctor for her pregnancy. "
"Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air."
"Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. "
"Which of the following is the best treatment for this patient?"
"###Options: A. Ampicillin B. Ceftriaxone C. Doxycycline D. Nitrofurantoin"
)
}
prompt_batch = [prompt_input.format_map(example)]
sampling_params = SamplingParams(temperature=0.8)
llm = LLM(model="disi-unibo-nlp/pmc-llama-13b-awq", quantization="awq", dtype="half")
outputs = llm.generate(prompt_batch, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt}")
print(f"Response: {generated_text}")