---
language: en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- ruslanmv
- llama
- trl
base_model: meta-llama/Meta-Llama-3-8B
datasets:
- ruslanmv/ai-medical-chatbot
---

# Medical-Llama3-8B-GPTQ
[![](future.jpg)](https://ruslanmv.com/)
This is a fine-tuned version of the Llama3 8B model, specifically designed to answer medical questions. 
The model was trained on the AI Medical Chatbot dataset, which can be found at [ruslanmv/ai-medical-chatbot](https://huggingface.co/datasets/ruslanmv/ai-medical-chatbot). 
This fine-tuned model leverages technique GPTQ for efficient inference with 4-bit quantization.
GPTQ is a technique for compressing deep learning model weights through a 4-bit quantization process that targets efficient GPU inference.
This approach aims to reduce model size by converting weights to a 4-bit representation while controlling error. For better performance during inference, 
GPTQ dynamically restores the weights to float16, balancing the benefits of reduced memory usage with computational efficiency.

**Model:** [ruslanmv/Medical-Llama3-8B-GPTQ](https://huggingface.co/ruslanmv/Medical-Llama3-8B-GPTQ)

- **Developed by:** ruslanmv
- **License:** apache-2.0
- **Finetuned from model:** meta-llama/Meta-Llama-3-8B

## Installation

**Prerequisites:**

- A system with CUDA support is highly recommended for optimal performance.
- Python 3.10 or later

**Installation Steps:**

1. **Install required Python libraries:**

   ```bash
   pip install transformers==4.40.0

   ```

## Usage

Here's an example of how to use the Medical-Llama3-8B-GPTQ model to generate an answer to a medical question:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json
device = "cuda:0" if torch.cuda.is_available() else "cpu"
repo_id = "ruslanmv/Medical-Llama3-8B-GPTQ"


# download quantized model from Hugging Face Hub and load to the first GPU
model = AutoGPTQForCausalLM.from_quantized(repo_id, 
                                          device=device, 
                                           use_safetensors=True, 
                                           use_triton=False)
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

def create_prompt(user_query):
  B_INST, E_INST = "<s>[INST]", "[/INST]"
  B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
  DEFAULT_SYSTEM_PROMPT = """\
  You are an AI Medical Chatbot Assistant, I aim to provide comprehensive and informative responses to your inquiries. However, please note that while I strive for accuracy, my responses should not replace professional medical advice and short answers.
  If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."""
  SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS
  instruction = f"User asks: {user_query}\n"
  prompt = B_INST + SYSTEM_PROMPT + instruction + E_INST
  return prompt.strip()

def generate_text(model, tokenizer, prompt,
                  max_length=200,
                  temperature=0.7,
                  num_return_sequences=1):

    prompt = create_prompt(user_query)
    # Tokenize the prompt
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)  # Move input_ids to the same device as the model
    # Generate text
    output = model.generate(
        input_ids=input_ids,
        max_length=max_length,
        temperature=temperature,
        num_return_sequences=num_return_sequences,
        pad_token_id=tokenizer.eos_token_id,  # Set pad token to end of sequence token
        do_sample=True
    )    
    # Decode the generated output
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
  
   # Split the generated text based on the prompt and take the portion after it
    generated_text = generated_text.split(prompt)[-1].strip()

    return generated_text
```

## Inference Example

This section showcases how to use the model for inference.

**User Query:**

```
user_query = "I'm a 35-year-old male experiencing symptoms like fatigue, increased sensitivity to cold, and dry, itchy skin. Could these be indicative of hypothyroidism?"

```
**Answer:** 

```
generated_text = generate_text(model, tokenizer, user_query)    
print(generated_text)

```
You will get
```
I understand your concern. It could be attributed to hypothyroidism. You may also have perifollicular inflammation. I suggest you to get your thyroid profile done to rule out hypothyroidism. I would also suggest you to use a mild moisturizing cream, with sunscreen, to

```


## License

This model is licensed under the Apache License 2.0. You can find the full license in the LICENSE file.