readme: add detailed instructions
Browse files
README.md
CHANGED
@@ -23,11 +23,57 @@ Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-
|
|
23 |
|
24 |
# How to use:
|
25 |
|
|
|
|
|
26 |
- Find the relevant directory
|
27 |
- Download all files
|
28 |
- Run merge.py
|
29 |
- Merged GGUF should appear
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
# Quants:
|
32 |
```
|
33 |
- bf16 [size: 439gb]
|
|
|
23 |
|
24 |
# How to use:
|
25 |
|
26 |
+
**Downloading the bf16:**
|
27 |
+
|
28 |
- Find the relevant directory
|
29 |
- Download all files
|
30 |
- Run merge.py
|
31 |
- Merged GGUF should appear
|
32 |
|
33 |
+
**Downloading the quantizations:**
|
34 |
+
- Find the relevant directory
|
35 |
+
- Download all files
|
36 |
+
- Point to the first split (most programs should load all the splits automatically now)
|
37 |
+
|
38 |
+
**Running in llama.cpp:**
|
39 |
+
|
40 |
+
To start in command line interactive mode (text completion):
|
41 |
+
```
|
42 |
+
main -m DeepSeek-V2-Chat.{quant}.gguf -c {context length} --color -i
|
43 |
+
```
|
44 |
+
To use llama.cpp OpenAI compatible server:
|
45 |
+
```
|
46 |
+
server \
|
47 |
+
-m DeepSeek-V2-Chat.{quant}.gguf \
|
48 |
+
-c {context_length} \
|
49 |
+
(--color [recommended: colored output in supported terminals]) \
|
50 |
+
(-i [note: interactive mode]) \
|
51 |
+
(--mlock [note: avoid using swap]) \
|
52 |
+
(--verbose) \
|
53 |
+
(--log-disable [note: disable logging to file, may be useful for prod]) \
|
54 |
+
(--metrics [note: prometheus compatible monitoring endpoint]) \
|
55 |
+
(--api-key [string]) \
|
56 |
+
(--port [int]) \
|
57 |
+
(--flash-attn [note: must be fully offloaded to supported GPU])
|
58 |
+
```
|
59 |
+
Making an importance matrix:
|
60 |
+
```
|
61 |
+
imatrix \
|
62 |
+
-m DeepSeek-V2-Chat.{quant}.gguf \
|
63 |
+
-f groups_merged.txt \
|
64 |
+
--verbosity [0, 1, 2] \
|
65 |
+
-ngl {GPU offloading; must build with CUDA} \
|
66 |
+
--ofreq {recommended: 1}
|
67 |
+
```
|
68 |
+
Making a quant:
|
69 |
+
```
|
70 |
+
quantize \
|
71 |
+
DeepSeek-V2-Chat.bf16.gguf \
|
72 |
+
DeepSeek-V2-Chat.{quant}.gguf \
|
73 |
+
{quant} \
|
74 |
+
(--imatrix [file])
|
75 |
+
```
|
76 |
+
|
77 |
# Quants:
|
78 |
```
|
79 |
- bf16 [size: 439gb]
|