leafspark
/

DeepSeek-V2-Chat-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

leafspark commited on May 19, 2024

Commit

29a06ae

·

verified ·

1 Parent(s): b6095fa

Update README.md

Files changed (1) hide show

README.md +16 -10

README.md CHANGED Viewed

@@ -20,13 +20,19 @@ Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-
 - Merged GGUF should appear
 # Quants:
-- bf16 (finished, currently splitting and uploading) [size: 439gb]
-- f32 (may require some time to upload, after q8_0) [estimated size: ~800gb]
-- q8_0 (after bf16) [estimated size: 233.27gb]
-- ~~q4_k_m (after q8_0) [estimated size: 133.10gb]~~
-- ~~q2_k (after q4_k_m) [estimated size: ~65gb]~~
-- ~~q3_k_s (low priority) [estimated size: 96.05gb]~~
-If quantize.exe supports it I will make RTN quants (edit: it doesn't, will try building from fork).
-Note: the bf16 GGUF does not have some DeepSeek v2 specific parameters, will look into adding them

 - Merged GGUF should appear
 # Quants:
+- bf16 (finished, uploading) [size: 439gb]
+- q8_0 (after q2_k) [estimated size: 233.27gb]
+- q4_k_m (uploading) [size: 132gb]
+- q2_k (generating) [size: ~65gb]
+- q3_k_s (low priority) [estimated size: 96.05gb]
+Note: the bf16 GGUF does not have some DeepSeek v2 specific parameters, will look into adding them
+Please use commit 039896407afd40e54321d47c5063c46a52da3e01, otherwise use these metadata KV overrides:
+```
+deepseek2.attention.q_lora_rank=int:1536
+deepseek2.attention.kv_lora_rank=int:512
+deepseek2.expert_shared_count=int:2
+deepseek2.expert_feed_forward_length=int:1536
+deepseek2.leading_dense_block_count=int:1
+```