leafspark commited on
Commit
29a06ae
1 Parent(s): b6095fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -10
README.md CHANGED
@@ -20,13 +20,19 @@ Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-
20
  - Merged GGUF should appear
21
 
22
  # Quants:
23
- - bf16 (finished, currently splitting and uploading) [size: 439gb]
24
- - f32 (may require some time to upload, after q8_0) [estimated size: ~800gb]
25
- - q8_0 (after bf16) [estimated size: 233.27gb]
26
- - ~~q4_k_m (after q8_0) [estimated size: 133.10gb]~~
27
- - ~~q2_k (after q4_k_m) [estimated size: ~65gb]~~
28
- - ~~q3_k_s (low priority) [estimated size: 96.05gb]~~
29
-
30
- If quantize.exe supports it I will make RTN quants (edit: it doesn't, will try building from fork).
31
-
32
- Note: the bf16 GGUF does not have some DeepSeek v2 specific parameters, will look into adding them
 
 
 
 
 
 
 
20
  - Merged GGUF should appear
21
 
22
  # Quants:
23
+ - bf16 (finished, uploading) [size: 439gb]
24
+ - q8_0 (after q2_k) [estimated size: 233.27gb]
25
+ - q4_k_m (uploading) [size: 132gb]
26
+ - q2_k (generating) [size: ~65gb]
27
+ - q3_k_s (low priority) [estimated size: 96.05gb]
28
+
29
+ Note: the bf16 GGUF does not have some DeepSeek v2 specific parameters, will look into adding them
30
+
31
+ Please use commit 039896407afd40e54321d47c5063c46a52da3e01, otherwise use these metadata KV overrides:
32
+ ```
33
+ deepseek2.attention.q_lora_rank=int:1536
34
+ deepseek2.attention.kv_lora_rank=int:512
35
+ deepseek2.expert_shared_count=int:2
36
+ deepseek2.expert_feed_forward_length=int:1536
37
+ deepseek2.leading_dense_block_count=int:1
38
+ ```