File size: 2,489 Bytes
ad2fd34
7c372d2
5df075b
 
c69c8f9
3df32f3
c69c8f9
 
 
 
 
fe7ba36
c69c8f9
 
 
3e359ce
d7a5687
5ac00d7
d7a5687
 
 
 
275d684
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d7a5687
 
 
d5f2846
d7a5687
275d684
d7a5687
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: openrail
model_creator: axiong
model_name: PMC_LLaMA_13B
---
# PMC_LLaMA_13B - AWQ
- Model creator: [axiong](https://huggingface.co/axiong)
- Original model: [PMC_LLaMA_13B](https://huggingface.co/axiong/PMC_LLaMA_13B)

## Description

This repository contains AWQ model files for [PMC_LLaMA_13B](https://huggingface.co/axiong/PMC_LLaMA_13B).

### About AWQ

[Activation-aware Weight Quantization (AWQ)](https://arxiv.org/abs/2306.00978) selectively preserves a subset of crucial weights for LLM performance instead of quantizing all weights in a model. This targeted approach minimizes quantization loss, allowing models to operate in 4-bit precision without compromising performance.

Example of usage with vLLM library:

```python
from vllm import LLM, SamplingParams

tokenizer = AutoTokenizer.from_pretrained('axiong/PMC_LLaMA_13B')

prompt_input = (
    'Below is an instruction that describes a task, paired with an input that provides further context.'
    'Write a response that appropriately completes the request.\n\n'
    '### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:'
)

example = {
    "instruction": "You're a doctor, kindly address the medical queries according to the patient's account. Answer with the best option directly.",
    "input": (
        "###Question: A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. "
        "She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. "
        "She otherwise feels well and is followed by a doctor for her pregnancy. "
        "Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air."
        "Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. "
        "Which of the following is the best treatment for this patient?"
        "###Options: A. Ampicillin B. Ceftriaxone C. Doxycycline D. Nitrofurantoin"
    )
}

prompt_batch = [prompt_input.format_map(example)]

sampling_params = SamplingParams(temperature=0.8)

llm = LLM(model="disi-unibo-nlp/pmc-llama-13b-awq", quantization="awq", dtype="half")

outputs = llm.generate(prompt_batch, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt}")
    print(f"Response: {generated_text}")
```