File size: 1,062 Bytes
dfefd68
540e926
 
dfefd68
 
540e926
dfefd68
540e926
 
 
dfefd68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cb1d836
 
f3eac8c
cb1d836
dfefd68
 
 
 
 
 
 
 
 
 
 
 
540e926
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
language:
- en
base_model:
- deepseek-ai/DeepSeek-V3
pipeline_tag: text-generation
---
# DeepSeek V3 - INT4 (TensorRT-LLM)

This repository provides an INT4-quantized version of the DeepSeek V3 model, suitable for high-speed, memory-efficient inference with TensorRT-LLM.


Model Summary
	•	Base Model: DeepSeek V3 (BF16) <--- (from Nvidia FP8)
	•	Quantization: Weight-only INT4 (W4A16)


```sh
python convert_checkpoint.py \
  --model_dir /home/user/hf/deepseek-v3-bf16 \
  --output_dir /home/user/hf/deepseek-v3-int4 \
  --dtype bfloat16 \
  --tp_size 4 \
  --use_weight_only \
  --weight_only_precision int4 \
  --workers 4
```

### Hardware reqs:

* 4×80 GB H100 or H200 (Optimal)


### Example usage:

```sh
trtllm-build --checkpoint_dir /DeepSeek-V3-int4-TensorRT  \
--output_dir ./trtllm_engines/deepseek_v3/int4/tp4-sel4096-isl2048-bs4  \
...
```


### Disclaimer:

This model is a quantized checkpoint intended for research and experimentation with high-performance inference. Use at your own risk and validate outputs for production use-cases.