--- language: - en base_model: - deepseek-ai/DeepSeek-V3 pipeline_tag: text-generation --- # DeepSeek V3 - INT4 (TensorRT-LLM) This repository provides an INT4-quantized version of the DeepSeek V3 model, suitable for high-speed, memory-efficient inference with TensorRT-LLM. Model Summary • Base Model: DeepSeek V3 (BF16) <--- (from Nvidia FP8) • Quantization: Weight-only INT4 (W4A16) ```sh python convert_checkpoint.py \ --model_dir /home/user/hf/deepseek-v3-bf16 \ --output_dir /home/user/hf/deepseek-v3-int4 \ --dtype bfloat16 \ --tp_size 4 \ --use_weight_only \ --weight_only_precision int4 \ --workers 4 ``` ### Hardware reqs: * 4×80 GB H100 or H200 (Optimal) ### Example usage: ```sh trtllm-build --checkpoint_dir /DeepSeek-V3-int4-TensorRT \ --output_dir ./trtllm_engines/deepseek_v3/int4/tp4-sel4096-isl2048-bs4 \ ... ``` ### Disclaimer: This model is a quantized checkpoint intended for research and experimentation with high-performance inference. Use at your own risk and validate outputs for production use-cases.