File size: 1,062 Bytes
dfefd68 540e926 dfefd68 540e926 dfefd68 540e926 dfefd68 cb1d836 f3eac8c cb1d836 dfefd68 540e926 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
language:
- en
base_model:
- deepseek-ai/DeepSeek-V3
pipeline_tag: text-generation
---
# DeepSeek V3 - INT4 (TensorRT-LLM)
This repository provides an INT4-quantized version of the DeepSeek V3 model, suitable for high-speed, memory-efficient inference with TensorRT-LLM.
Model Summary
• Base Model: DeepSeek V3 (BF16) <--- (from Nvidia FP8)
• Quantization: Weight-only INT4 (W4A16)
```sh
python convert_checkpoint.py \
--model_dir /home/user/hf/deepseek-v3-bf16 \
--output_dir /home/user/hf/deepseek-v3-int4 \
--dtype bfloat16 \
--tp_size 4 \
--use_weight_only \
--weight_only_precision int4 \
--workers 4
```
### Hardware reqs:
* 4×80 GB H100 or H200 (Optimal)
### Example usage:
```sh
trtllm-build --checkpoint_dir /DeepSeek-V3-int4-TensorRT \
--output_dir ./trtllm_engines/deepseek_v3/int4/tp4-sel4096-isl2048-bs4 \
...
```
### Disclaimer:
This model is a quantized checkpoint intended for research and experimentation with high-performance inference. Use at your own risk and validate outputs for production use-cases. |