inarikami
/

DeepSeek-V3-int4-TensorRT

Text Generation

Model card Files Files and versions Community

DeepSeek-V3-int4-TensorRT / README.md

inarikami's picture

Update README.md

f3eac8c verified 12 days ago

|

history blame contribute delete

1.06 kB

	---
	language:
	- en
	base_model:
	- deepseek-ai/DeepSeek-V3
	pipeline_tag: text-generation
	---
	# DeepSeek V3 - INT4 (TensorRT-LLM)

	This repository provides an INT4-quantized version of the DeepSeek V3 model, suitable for high-speed, memory-efficient inference with TensorRT-LLM.


	Model Summary
	• Base Model: DeepSeek V3 (BF16) <--- (from Nvidia FP8)
	• Quantization: Weight-only INT4 (W4A16)


	```sh
	python convert_checkpoint.py \
	--model_dir /home/user/hf/deepseek-v3-bf16 \
	--output_dir /home/user/hf/deepseek-v3-int4 \
	--dtype bfloat16 \
	--tp_size 4 \
	--use_weight_only \
	--weight_only_precision int4 \
	--workers 4
	```

	### Hardware reqs:

	* 4×80 GB H100 or H200 (Optimal)


	### Example usage:

	```sh
	trtllm-build --checkpoint_dir /DeepSeek-V3-int4-TensorRT \
	--output_dir ./trtllm_engines/deepseek_v3/int4/tp4-sel4096-isl2048-bs4 \
	...
	```


	### Disclaimer:

	This model is a quantized checkpoint intended for research and experimentation with high-performance inference. Use at your own risk and validate outputs for production use-cases.