leafspark commited on
Commit
0ceb327
1 Parent(s): b3513b3

readme: add detailed instructions

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md CHANGED
@@ -23,11 +23,57 @@ Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-
23
 
24
  # How to use:
25
 
 
 
26
  - Find the relevant directory
27
  - Download all files
28
  - Run merge.py
29
  - Merged GGUF should appear
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  # Quants:
32
  ```
33
  - bf16 [size: 439gb]
 
23
 
24
  # How to use:
25
 
26
+ **Downloading the bf16:**
27
+
28
  - Find the relevant directory
29
  - Download all files
30
  - Run merge.py
31
  - Merged GGUF should appear
32
 
33
+ **Downloading the quantizations:**
34
+ - Find the relevant directory
35
+ - Download all files
36
+ - Point to the first split (most programs should load all the splits automatically now)
37
+
38
+ **Running in llama.cpp:**
39
+
40
+ To start in command line interactive mode (text completion):
41
+ ```
42
+ main -m DeepSeek-V2-Chat.{quant}.gguf -c {context length} --color -i
43
+ ```
44
+ To use llama.cpp OpenAI compatible server:
45
+ ```
46
+ server \
47
+ -m DeepSeek-V2-Chat.{quant}.gguf \
48
+ -c {context_length} \
49
+ (--color [recommended: colored output in supported terminals]) \
50
+ (-i [note: interactive mode]) \
51
+ (--mlock [note: avoid using swap]) \
52
+ (--verbose) \
53
+ (--log-disable [note: disable logging to file, may be useful for prod]) \
54
+ (--metrics [note: prometheus compatible monitoring endpoint]) \
55
+ (--api-key [string]) \
56
+ (--port [int]) \
57
+ (--flash-attn [note: must be fully offloaded to supported GPU])
58
+ ```
59
+ Making an importance matrix:
60
+ ```
61
+ imatrix \
62
+ -m DeepSeek-V2-Chat.{quant}.gguf \
63
+ -f groups_merged.txt \
64
+ --verbosity [0, 1, 2] \
65
+ -ngl {GPU offloading; must build with CUDA} \
66
+ --ofreq {recommended: 1}
67
+ ```
68
+ Making a quant:
69
+ ```
70
+ quantize \
71
+ DeepSeek-V2-Chat.bf16.gguf \
72
+ DeepSeek-V2-Chat.{quant}.gguf \
73
+ {quant} \
74
+ (--imatrix [file])
75
+ ```
76
+
77
  # Quants:
78
  ```
79
  - bf16 [size: 439gb]