ruslanmv
/

Medical-Llama3-v2-Q4_K_M-GGUF

+---
+tags:
+- gguf
+- llama.cpp
+- quantized
+- ruslanmv/Medical-Llama3-v2
+license: apache-2.0
+---
+# ruslanmv/Medical-Llama3-v2-Q4_K_M-GGUF
+This model was converted to GGUF format from [`ruslanmv/Medical-Llama3-v2`](https://huggingface.co/ruslanmv/Medical-Llama3-v2) using llama.cpp via
+[Convert Model to GGUF](https://huggingface.co/spaces/ruslanmv/convert_to_gguf).
+**Key Features:**
+* Quantized for reduced file size (GGUF format)
+* Optimized for use with llama.cpp
+* Compatible with llama-server for efficient serving
+Refer to the [original model card](https://huggingface.co/ruslanmv/Medical-Llama3-v2) for more details on the base model.
+## Usage with llama.cpp
+**1. Install llama.cpp:**
+```bash
+brew install llama.cpp  # For macOS/Linux
+```
+**2. Run Inference:**
+**CLI:**
+```bash
+llama-cli --hf-repo ruslanmv/Medical-Llama3-v2-Q4_K_M-GGUF --hf-file Medical-Llama3-v2-Q4_K_M-GGUF-4bit.gguf -p "Your prompt here"
+```
+**Server:**
+```bash
+llama-server --hf-repo ruslanmv/Medical-Llama3-v2-Q4_K_M-GGUF --hf-file Medical-Llama3-v2-Q4_K_M-GGUF-4bit.gguf -c 2048
+```
+For more advanced usage, refer to the [llama.cpp repository](https://github.com/ggerganov/llama.cpp).