VLLM 0.7.2 can start the model normally, but there is no output when simulating a request using Curl, it blocks!

#2
by JZMALi - opened

python -m vllm.entrypoints.openai.api_server
--served-model-name deepseek-r1
--model /root/filesystem/model_r1/DeepSeek-R1-int4-gptq-sym-inc/OPEA/DeepSeek-R1-int4-gptq-sym-inc
--trust-remote-code
--host 0.0.0.0
--port 8096
--max-model-len 32768
--max-num-batched-tokens 32768
--tensor-parallel-size 8
--gpu_memory_utilization 0.9

Open Platform for Enterprise AI org

Sorry, we don't have enough resources to run this model on vLLM. You may seek assistance in their repository. This model follows the standard GPTQ format.

I also encountered this problem, is there any solution yet? I have opened an issue at https://github.com/vllm-project/vllm/issues/16111

Open Platform for Enterprise AI org

You can try this model: https://huggingface.co/OPEA/DeepSeek-R1-int4-AutoRound-awq-asym. Due to limited resources, we only tested the AWQ version. It appears that vLLM currently also doesn't support AWQ with symmetric quantization.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment