Example run:

docker run --rm --runtime nvidia --gpus 'all' -e 'HF_TOKEN' -v '/root/.cache/huggingface:/root/.cache/huggingface' -p 127.0.0.1:8000:8000 "vllm/vllm-openai:v0.7.3" --model 'ig1/QwQ-32B-FP8-Dynamic' --served-model-name 'QwQ-32B' --enable-reasoning --reasoning-parser deepseek_r1 --override-generation-config '{"temperature":0.6,"top_p":0.95}'
Downloads last month
0
Safetensors
Model size
32.8B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for ig1/QwQ-32B-FP8-Dynamic

Base model

Qwen/Qwen2.5-32B
Finetuned
Qwen/QwQ-32B
Quantized
(67)
this model