|
--- |
|
license: unknown |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Deepseek-V2-Chat-GGUF |
|
|
|
Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat) |
|
|
|
Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2](https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2) |
|
|
|
# Warning: This will not work unless you compile llama.cpp from the repo provided! |
|
|
|
# How to use: |
|
|
|
- Find the relevant directory |
|
- Download all files |
|
- Run merge.py |
|
- Merged GGUF should appear |
|
|
|
# Quants: |
|
- bf16 (finished, currently splitting and uploading) [size: 439gb] |
|
- f32 (may require some time to upload, after q8_0) [estimated size: ~800gb] |
|
- q8_0 (after bf16) [estimated size: 233.27gb] |
|
- ~~q4_k_m (after q8_0) [estimated size: 133.10gb]~~ |
|
- ~~q2_k (after q4_k_m) [estimated size: ~65gb]~~ |
|
- ~~q3_k_s (low priority) [estimated size: 96.05gb]~~ |
|
|
|
If quantize.exe supports it I will make RTN quants (edit: it doesn't, will try building from fork). |
|
|
|
Note: the bf16 GGUF does not have some DeepSeek v2 specific parameters, will look into adding them |