不支持sglang
#18
by
zhangdahaodaddy
- opened
docker run --name vllm-minicpm3-4b --runtime nvidia --gpus '"device=0,1"' -v /home/jszc/vllm:/root/.cache/modelscope -p 11436:8080 --ipc=host my_vllm_updated:latest --model /root/.cache/modelscope/MiniCPM3-4B --port=8080 --served-model-name llama3.1-8b --gpu-memory-utilization 0.95 --max_num_seqs 1024 --max_num_batched_tokens 8192 -tp 2 --enable-chunked-prefill true --enable-prefix-caching --trust-remote-code
neoz
changed discussion status to
closed