gguf?
please
Unable to run even with 4090 and 3090, running out of memory.
Unable to run even with 4090 and 3090, running out of memory.
a 16b model running out of memory on a 24gb 4090? that's interesting, well I run models on the cloud (because I have a 3060 6gb locally) so I can't complain.
It's MoE model. I am unable to even run the sample code provided and both GPU's are close to maxing out in terms of RAM. 4090 has less than 1GB left and it asks for 1 Gb.
Unable to run even with 4090 and 3090, running out of memory.
That's expected... the model files in total already > 24GB... you need quantized version
Unable to run even with 4090 and 3090, running out of memory.
That's expected... the model files in total already > 24GB... you need quantized version
My dual-4090 rig also went poooooff, OOM. Any ideas? xD