Unable to run with vllm

#24
by rdodev - opened

Running this model has been a challenge and have yet to be able to stand it up w/o errors.

  • using latest vLLM
  • using transformers v4.51.0
  • using fp8 quant
  • running on a 2 x h100 GPUs (80gb VRAM each)

Giving torch.compile errors. I disabled cache compilation and get CUDA OOM errors.

Has anyone been able to stand this model up with vlllm?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment