Model usage on vLLM fails: `No available memory for the cache blocks` & `Error executing method 'determine_num_available_blocks'`
2
#8 opened 7 days ago
by
surajd

benchmark test use vllm ? input/output=500/2000 ?
1
#6 opened 11 days ago
by
chuanyizjc

FP8 and FP4
#5 opened 11 days ago
by
whatever1983
how to reproduce the benchmark score?
#4 opened 16 days ago
by
lincharliesun
AWQ OR GPTQ Quant
1
1
#2 opened 17 days ago
by
getfit

"ffn_mult": null,
1
9
#1 opened 17 days ago
by
csabakecskemeti
