--- language: en license: apache-2.0 tags: - text-generation-inference - transformers - ruslanmv - llama - trl base_model: meta-llama/Meta-Llama-3-8B datasets: - ruslanmv/ai-medical-chatbot --- # Medical-Llama3-8B-GPTQ [![](future.jpg)](https://ruslanmv.com/) This is a fine-tuned version of the Llama3 8B model, specifically designed to answer medical questions. The model was trained on the AI Medical Chatbot dataset, which can be found at [ruslanmv/ai-medical-chatbot](https://huggingface.co/datasets/ruslanmv/ai-medical-chatbot). This fine-tuned model leverages technique GPTQ for efficient inference with 4-bit quantization. GPTQ is a technique for compressing deep learning model weights through a 4-bit quantization process that targets efficient GPU inference. This approach aims to reduce model size by converting weights to a 4-bit representation while controlling error. For better performance during inference, GPTQ dynamically restores the weights to float16, balancing the benefits of reduced memory usage with computational efficiency. **Model:** [ruslanmv/Medical-Llama3-8B-GPTQ](https://huggingface.co/ruslanmv/Medical-Llama3-8B-GPTQ) - **Developed by:** ruslanmv - **License:** apache-2.0 - **Finetuned from model:** meta-llama/Meta-Llama-3-8B ## Installation **Prerequisites:** - A system with CUDA support is highly recommended for optimal performance. - Python 3.10 or later **Installation Steps:** 1. **Install required Python libraries:** ```bash pip install transformers==4.40.0 ``` ## Usage Here's an example of how to use the Medical-Llama3-8B-GPTQ model to generate an answer to a medical question: ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch import json device = "cuda:0" if torch.cuda.is_available() else "cpu" repo_id = "ruslanmv/Medical-Llama3-8B-GPTQ" # download quantized model from Hugging Face Hub and load to the first GPU model = AutoGPTQForCausalLM.from_quantized(repo_id, device=device, use_safetensors=True, use_triton=False) tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir) def create_prompt(user_query): B_INST, E_INST = "[INST]", "[/INST]" B_SYS, E_SYS = "<>\n", "\n<>\n\n" DEFAULT_SYSTEM_PROMPT = """\ You are an AI Medical Chatbot Assistant, I aim to provide comprehensive and informative responses to your inquiries. However, please note that while I strive for accuracy, my responses should not replace professional medical advice and short answers. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.""" SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS instruction = f"User asks: {user_query}\n" prompt = B_INST + SYSTEM_PROMPT + instruction + E_INST return prompt.strip() def generate_text(model, tokenizer, prompt, max_length=200, temperature=0.7, num_return_sequences=1): prompt = create_prompt(user_query) # Tokenize the prompt input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device) # Move input_ids to the same device as the model # Generate text output = model.generate( input_ids=input_ids, max_length=max_length, temperature=temperature, num_return_sequences=num_return_sequences, pad_token_id=tokenizer.eos_token_id, # Set pad token to end of sequence token do_sample=True ) # Decode the generated output generated_text = tokenizer.decode(output[0], skip_special_tokens=True) # Split the generated text based on the prompt and take the portion after it generated_text = generated_text.split(prompt)[-1].strip() return generated_text ``` ## Inference Example This section showcases how to use the model for inference. **User Query:** ``` user_query = "I'm a 35-year-old male experiencing symptoms like fatigue, increased sensitivity to cold, and dry, itchy skin. Could these be indicative of hypothyroidism?" ``` **Answer:** ``` generated_text = generate_text(model, tokenizer, user_query) print(generated_text) ``` You will get ``` I understand your concern. It could be attributed to hypothyroidism. You may also have perifollicular inflammation. I suggest you to get your thyroid profile done to rule out hypothyroidism. I would also suggest you to use a mild moisturizing cream, with sunscreen, to ``` ## License This model is licensed under the Apache License 2.0. You can find the full license in the LICENSE file.