Unable to run with vllm
#24
by
rdodev
- opened
Running this model has been a challenge and have yet to be able to stand it up w/o errors.
- using latest
vLLM
- using transformers
v4.51.0
- using fp8 quant
- running on a 2 x h100 GPUs (80gb VRAM each)
Giving torch.compile
errors. I disabled cache compilation and get CUDA OOM errors.
Has anyone been able to stand this model up with vlllm?