ruslanmv commited on
Commit
8027f74
·
verified ·
1 Parent(s): d58b76d

Create/update model card (README.md)

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - gguf
5
+ - llama.cpp
6
+ - quantized
7
+ - ruslanmv/Medical-Llama3-v2
8
+ license: apache-2.0
9
+ ---
10
+
11
+ # ruslanmv/Medical-Llama3-v2-Q4_K_M-GGUF
12
+
13
+ This model was converted to GGUF format from [`ruslanmv/Medical-Llama3-v2`](https://huggingface.co/ruslanmv/Medical-Llama3-v2) using llama.cpp via
14
+ [Convert Model to GGUF](https://huggingface.co/spaces/ruslanmv/convert_to_gguf).
15
+
16
+ **Key Features:**
17
+
18
+ * Quantized for reduced file size (GGUF format)
19
+ * Optimized for use with llama.cpp
20
+ * Compatible with llama-server for efficient serving
21
+
22
+ Refer to the [original model card](https://huggingface.co/ruslanmv/Medical-Llama3-v2) for more details on the base model.
23
+
24
+ ## Usage with llama.cpp
25
+
26
+ **1. Install llama.cpp:**
27
+
28
+ ```bash
29
+ brew install llama.cpp # For macOS/Linux
30
+ ```
31
+
32
+ **2. Run Inference:**
33
+
34
+ **CLI:**
35
+
36
+ ```bash
37
+ llama-cli --hf-repo ruslanmv/Medical-Llama3-v2-Q4_K_M-GGUF --hf-file Medical-Llama3-v2-Q4_K_M-GGUF-4bit.gguf -p "Your prompt here"
38
+ ```
39
+
40
+ **Server:**
41
+
42
+ ```bash
43
+ llama-server --hf-repo ruslanmv/Medical-Llama3-v2-Q4_K_M-GGUF --hf-file Medical-Llama3-v2-Q4_K_M-GGUF-4bit.gguf -c 2048
44
+ ```
45
+
46
+ For more advanced usage, refer to the [llama.cpp repository](https://github.com/ggerganov/llama.cpp).