Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://h
|
|
19 |
|
20 |
Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2](https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2)
|
21 |
|
22 |
-
# Warning: This will not work unless you compile llama.cpp from the repo provided!
|
23 |
|
24 |
# How to use:
|
25 |
|
@@ -29,19 +29,55 @@ Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-
|
|
29 |
- Merged GGUF should appear
|
30 |
|
31 |
# Quants:
|
|
|
32 |
- bf16 [size: 439gb]
|
33 |
- q8_0 (after q2_k) [estimated size: 233.27gb]
|
34 |
- q4_k_m [size: 132gb]
|
35 |
- q2_k (uploading) [size: 80gb]
|
36 |
-
- q3_k_s (generating) [estimated size: 96.05gb]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
Note: the model files do not have some DeepSeek v2 specific parameters, will look into adding them
|
39 |
|
40 |
-
Please use commit 039896407afd40e54321d47c5063c46a52da3e01
|
41 |
```
|
42 |
deepseek2.attention.q_lora_rank=int:1536
|
43 |
deepseek2.attention.kv_lora_rank=int:512
|
44 |
deepseek2.expert_shared_count=int:2
|
45 |
deepseek2.expert_feed_forward_length=int:1536
|
46 |
deepseek2.leading_dense_block_count=int:1
|
47 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2](https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2)
|
21 |
|
22 |
+
# Warning: This will not work unless you compile llama.cpp from the repo provided (and set metadata KV overrides)!
|
23 |
|
24 |
# How to use:
|
25 |
|
|
|
29 |
- Merged GGUF should appear
|
30 |
|
31 |
# Quants:
|
32 |
+
```
|
33 |
- bf16 [size: 439gb]
|
34 |
- q8_0 (after q2_k) [estimated size: 233.27gb]
|
35 |
- q4_k_m [size: 132gb]
|
36 |
- q2_k (uploading) [size: 80gb]
|
37 |
+
- q3_k_s (generating, using importance matrix) [estimated size: 96.05gb]
|
38 |
+
```
|
39 |
+
|
40 |
+
# Planned Quants (using importance matrix):
|
41 |
+
```
|
42 |
+
- q5_k_m
|
43 |
+
- q5_k_s
|
44 |
+
- q3_k_m
|
45 |
+
- q6_k
|
46 |
+
- iq4_nl
|
47 |
+
- iq4_xs
|
48 |
+
- iq2_xxs
|
49 |
+
- iq2_xs
|
50 |
+
- iq2_s
|
51 |
+
- iq2_m
|
52 |
+
- iq1_s
|
53 |
+
- iq1_m
|
54 |
+
```
|
55 |
|
56 |
Note: the model files do not have some DeepSeek v2 specific parameters, will look into adding them
|
57 |
|
58 |
+
Please use commit `039896407afd40e54321d47c5063c46a52da3e01`, otherwise use these metadata KV overrides:
|
59 |
```
|
60 |
deepseek2.attention.q_lora_rank=int:1536
|
61 |
deepseek2.attention.kv_lora_rank=int:512
|
62 |
deepseek2.expert_shared_count=int:2
|
63 |
deepseek2.expert_feed_forward_length=int:1536
|
64 |
deepseek2.leading_dense_block_count=int:1
|
65 |
+
```
|
66 |
+
|
67 |
+
A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
|
68 |
+
|
69 |
+
# License:
|
70 |
+
- DeepSeek license for model weights
|
71 |
+
- MIT license for any repo code
|
72 |
+
|
73 |
+
# Performance:
|
74 |
+
~1.5t/s with Ryzen 3 3700x (96gb 3200mhz) [Q2_K]
|
75 |
+
|
76 |
+
# iMatrix:
|
77 |
+
Find imatrix.dat in the root of this repo, made with a Q2_K quant (see here for info: [https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693](https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693))
|
78 |
+
|
79 |
+
Using groups_merged.txt, find it here: [https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
|
80 |
+
|
81 |
+
# Censorship:
|
82 |
+
|
83 |
+
This model is quite censored, finetuning on toxic DPO might help.
|