Memory of GPU
I have a server with 8*A6000(48G),but still can not run this model.Does anyone know how many GPU need?
700~800GB(FP8)
Nonsense. Ignore the response above.
DeepSeek V3 is a Mixture of Experts (MoE) model. Its over 671B in size, but its activation token is only 37B (therefore although its a very big model, it has a memory requirement of a 37B model).
As a general rule of thumb, if you wanna calculate how much memory you need to run a dense model, grab its parameter size. But if its a MoE model like DeepSeek V3, then grab its activation token size.
Then you need to multiply it depending on the quantization you wanna run it at:
fp32 (full-precision) = x4
fp16 = x2
fp8 = x1.125
int8 = x1
fp4 = x0.5625
int4 = x0.5
Then you can simply divide it by a billion to get the result in gigabytes.
In this case, to run DeepSeek V3 at fp32, you'd need 148GB ((37.000.000.000 x 4) / 1.000.000.000). To run it at fp4, it requires ~20.8GB ((37.000.000.000 x 0.5625) / 1.000.000.000).
With your server, you could theoretically run it at fp8. But this method of calculation is a general rule of thumb that only calculates the memory required for interference (it doesn't take factors like Processing Overhead into account). Maybe you can't run it because the overhead is bypassing your memory budget, so I recommend you run it at fp4 (since you have more than enough to run it at that quantization size).
To add to the conversation...
If you do a CPU only setup, you need about 768gb of system ram, but it's relatively slow: https://youtu.be/v4810MVGhog
If you do a hybrid CPU system ram + GPU system, you only need one 3090 24gb and about 500gb of system ram and it's much faster: https://youtu.be/fI6uGPcxDbM