gguf?

#1
by AlgorithmicKing - opened

please

Unable to run even with 4090 and 3090, running out of memory.

Unable to run even with 4090 and 3090, running out of memory.

a 16b model running out of memory on a 24gb 4090? that's interesting, well I run models on the cloud (because I have a 3060 6gb locally) so I can't complain.

It's MoE model. I am unable to even run the sample code provided and both GPU's are close to maxing out in terms of RAM. 4090 has less than 1GB left and it asks for 1 Gb.

Unable to run even with 4090 and 3090, running out of memory.

That's expected... the model files in total already > 24GB... you need quantized version

Unable to run even with 4090 and 3090, running out of memory.

That's expected... the model files in total already > 24GB... you need quantized version

My dual-4090 rig also went poooooff, OOM. Any ideas? xD

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment