CPU inference on VLLM problem

#3
by sotarov - opened

When I'm trying to use the model on the current version of VLLM (0.7.4.dev365+g70b808fe) assembled for the CPU, an error occurs during the first test:

vllm serve ai-sage/GigaChat-20B-A3B-instruct-v1.5 --disable-log-requests --trust_remote_code --dtype bfloat16 --max-seq-len 8192
curl http://localhost:8000/v1/completions
-H "Content-Type: application/json"
-d '{
"model": "ai-sage/GigaChat-20B-A3B-instruct-v1.5",
"prompt": "Who are you?",
"max_tokens": 7,
"temperature": 0
}'

...
ERROR 03-11 15:34:34 [engine.py:141] AttributeError("'_OpNamespace' '_moe_C' object has no attribute 'topk_softmax'")
ERROR 03-11 15:34:34 [engine.py:141] Traceback (most recent call last):
ERROR 03-11 15:34:34 [engine.py:141] File "/opt/vllm/venv/lib/python3.12/site-packages/vllm-0.7.4.dev365+g70b808fe.cpu-py3.12-linux-x86_64.egg/vllm/engine/multiprocessing/engine.py", line 139, in start
...
ERROR 03-11 15:34:34 [engine.py:141] File "/opt/vllm/venv/lib/python3.12/site-packages/vllm-0.7.4.dev365+g70b808fe.cpu-py3.12-linux-x86_64.egg/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1057, in fused_topk
ERROR 03-11 15:34:34 [engine.py:141] ops.topk_softmax(
ERROR 03-11 15:34:34 [engine.py:141] File "/opt/vllm/venv/lib/python3.12/site-packages/vllm-0.7.4.dev365+g70b808fe.cpu-py3.12-linux-x86_64.egg/vllm/_custom_ops.py", line 1119, in topk_softmax
ERROR 03-11 15:34:34 [engine.py:141] torch.ops._moe_C.topk_softmax(topk_weights, topk_ids,
ERROR 03-11 15:34:34 [engine.py:141] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-11 15:34:34 [engine.py:141] File "/opt/vllm/venv/lib/python3.12/site-packages/torch/_ops.py", line 1225, in getattr
ERROR 03-11 15:34:34 [engine.py:141] raise AttributeError(
ERROR 03-11 15:34:34 [engine.py:141] AttributeError: '_OpNamespace' '_moe_C' object has no attribute 'topk_softmax'
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [14614]

Same problem with the ai-sage/GigaChat-20B-A3B-instruct and ai-sage/GigaChat-20B-A3B-base

Please, are there any advices or ideas how to fix this error?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment