|
--- |
|
language: |
|
- en |
|
base_model: |
|
- deepseek-ai/DeepSeek-V3 |
|
pipeline_tag: text-generation |
|
--- |
|
# DeepSeek V3 - INT4 (TensorRT-LLM) |
|
|
|
This repository provides an INT4-quantized version of the DeepSeek V3 model, suitable for high-speed, memory-efficient inference with TensorRT-LLM. |
|
|
|
|
|
Model Summary |
|
• Base Model: DeepSeek V3 (BF16) <--- (from Nvidia FP8) |
|
• Quantization: Weight-only INT4 (W4A16) |
|
|
|
|
|
```sh |
|
python convert_checkpoint.py \ |
|
--model_dir /home/user/hf/deepseek-v3-bf16 \ |
|
--output_dir /home/user/hf/deepseek-v3-int4 \ |
|
--dtype bfloat16 \ |
|
--tp_size 4 \ |
|
--use_weight_only \ |
|
--weight_only_precision int4 \ |
|
--workers 4 |
|
``` |
|
|
|
### Hardware reqs: |
|
|
|
* 4×80 GB H100 or H200 (Optimal) |
|
|
|
|
|
### Example usage: |
|
|
|
```sh |
|
trtllm-build --checkpoint_dir /DeepSeek-V3-int4-TensorRT \ |
|
--output_dir ./trtllm_engines/deepseek_v3/int4/tp4-sel4096-isl2048-bs4 \ |
|
... |
|
``` |
|
|
|
|
|
### Disclaimer: |
|
|
|
This model is a quantized checkpoint intended for research and experimentation with high-performance inference. Use at your own risk and validate outputs for production use-cases. |