I am using 1 NVIDIA L40 GPU with 48 vRAM with vLLM Docker container. but facing OutOfMemoryError

Deploy with docker on Linux:

docker run --runtime nvidia --gpus all
--name my_vllm_container
-v ~/.cache/huggingface:/root/.cache/huggingface
--env "HUGGING_FACE_HUB_TOKEN="
-p 8000:8000
--ipc=host
vllm/vllm-openai:latest
--model unsloth/Llama-4-Scout-17B-16E-Instruct-unsloth-bnb-4bit

Error :
Torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.50 GiB. GPU 0 has a total capacity of 44.31 GiB of which 728.31 MiB is free. Process 2790767 has 43.59 GiB memory in use. Of the allocated memory 43.10 GiB is allocated by PyTorch, and 5.57 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.

unsloth
/

Llama-4-Scout-17B-16E-Instruct-unsloth-bnb-4bit

L40 GPU facing OutOfMemoryError

I am using 1 NVIDIA L40 GPU with 48 vRAM with vLLM Docker container. but facing OutOfMemoryError

Deploy with docker on Linux: