leafspark
/

DeepSeek-V2-Chat-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

leafspark commited on May 21, 2024

Commit

f83b696

·

verified ·

1 Parent(s): 418c797

Update README.md

Files changed (1) hide show

README.md +40 -4

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://h
 Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2](https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2)
-# Warning: This will not work unless you compile llama.cpp from the repo provided!
 # How to use:
@@ -29,19 +29,55 @@ Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-
 - Merged GGUF should appear
 # Quants:
 - bf16 [size: 439gb]
 - q8_0 (after q2_k) [estimated size: 233.27gb]
 - q4_k_m [size: 132gb]
 - q2_k (uploading) [size: 80gb]
-- q3_k_s (generating) [estimated size: 96.05gb]
 Note: the model files do not have some DeepSeek v2 specific parameters, will look into adding them
-Please use commit 039896407afd40e54321d47c5063c46a52da3e01, otherwise use these metadata KV overrides:
 ```
 deepseek2.attention.q_lora_rank=int:1536
 deepseek2.attention.kv_lora_rank=int:512
 deepseek2.expert_shared_count=int:2
 deepseek2.expert_feed_forward_length=int:1536
 deepseek2.leading_dense_block_count=int:1
-```

 Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2](https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2)
+# Warning: This will not work unless you compile llama.cpp from the repo provided (and set metadata KV overrides)!
 # How to use:
 - Merged GGUF should appear
 # Quants:
+```
 - bf16 [size: 439gb]
 - q8_0 (after q2_k) [estimated size: 233.27gb]
 - q4_k_m [size: 132gb]
 - q2_k (uploading) [size: 80gb]
+- q3_k_s (generating, using importance matrix) [estimated size: 96.05gb]
+```
+# Planned Quants (using importance matrix):
+```
+- q5_k_m
+- q5_k_s
+- q3_k_m
+- q6_k
+- iq4_nl
+- iq4_xs
+- iq2_xxs
+- iq2_xs
+- iq2_s
+- iq2_m
+- iq1_s
+- iq1_m
+```
 Note: the model files do not have some DeepSeek v2 specific parameters, will look into adding them
+Please use commit `039896407afd40e54321d47c5063c46a52da3e01`, otherwise use these metadata KV overrides:
 ```
 deepseek2.attention.q_lora_rank=int:1536
 deepseek2.attention.kv_lora_rank=int:512
 deepseek2.expert_shared_count=int:2
 deepseek2.expert_feed_forward_length=int:1536
 deepseek2.leading_dense_block_count=int:1
+```
+A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
+# License:
+- DeepSeek license for model weights
+- MIT license for any repo code
+# Performance:
+~1.5t/s with Ryzen 3 3700x (96gb 3200mhz) [Q2_K]
+# iMatrix:
+Find imatrix.dat in the root of this repo, made with a Q2_K quant (see here for info: [https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693](https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693))
+Using groups_merged.txt, find it here: [https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
+# Censorship:
+This model is quite censored, finetuning on toxic DPO might help.