--- license: mit library_name: transformers base_model: - deepseek-ai/DeepSeek-V3-0324 --- # huihui-ai/DeepSeek-V3-0324-bf16 This model converted from DeepSeek-V3-0324 to BF16. Therefore, we have provided only the command to convert from Windows and information related to ollama. The Windows environment is much faster than the WSL2 environment, provided you have sufficient memory or virtual memory. The Linux environment hasn't been tested. If you are in a Linux or WSL environment, please refer to [huihui-ai/DeepSeek-R1-bf16](https://huggingface.co/huihui-ai/DeepSeek-R1-bf16). If needed, we can upload the bf16 version. ## FP8 to BF16 1. Download [deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324) model, requires approximately 641GB of space. ``` cd /d C:\Users\admin\models huggingface-cli download deepseek-ai/DeepSeek-V3-0324 --local-dir ./deepseek-ai/DeepSeek-V3-0324 ``` 2. Create the environment. ``` conda create -yn DeepSeek-V3-0324 python=3.12 conda activate DeepSeek-V3 pip install torch --index-url https://download.pytorch.org/whl/cu124 pip install -U triton-windows pip install transformers==4.46.3 pip install safetensors==0.4.5 pip install sentencepiece ``` 3. Convert to BF16, requires an additional approximately 1.3 TB of space. Here, you need to download the transformation code from the "inference" folder of [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) ``` cd deepseek-ai/DeepSeek-V3/inference python fp8_cast_bf16.py --input-fp8-hf-path C:/Users/admin/deepseek-ai/models/DeepSeek-V3-0324/ --output-bf16-hf-path C:/Users/admin/models/deepseek-ai/DeepSeek-V3-0324-bf16 ``` ## BF16 to f16.gguf 1. Use the [llama.cpp](https://github.com/ggerganov/llama.cpp) conversion program to convert DeepSeek-V3-0324-bf16 to gguf format, requires an additional approximately 1.3 TB of space. ``` python convert_hf_to_gguf.py C:/Users/admin/deepseek-ai/models/deepseek-ai/DeepSeek-V3-0324-bf16 --outfile C:/Users/admin/deepseek-ai/models/deepseek-ai/DeepSeek-V3-0324-bf16/ggml-model-f16.gguf --outtype f16 ``` 2. Use the [llama.cpp](https://github.com/ggerganov/llama.cpp) quantitative program to quantitative model (llama-quantize needs to be compiled.), other [quant option](https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/quantize.cpp). Convert first Q2_K, requires an additional approximately 227 GB of space. ``` llama-quantize C:/Users/admin/deepseek-ai/models/deepseek-ai/DeepSeek-V3-0324-bf16/ggml-model-f16.gguf C:/Users/admin/deepseek-ai/models/deepseek-ai/DeepSeek-V3-0324-bf16/ggml-model-Q2_K.gguf Q2_K ``` 3. Use llama-cli to test. ``` llama-cli -m C:/Users/admin/deepseek-ai/models/deepseek-ai/DeepSeek-V3-0324-bf16/ggml-model-Q2_K.gguf -n 2048 ``` ## Use with ollama **Note:** this model requires [Ollama 0.5.5](https://github.com/ollama/ollama/releases/tag/v0.5.5) ### Modefile ``` FROM deepseek-ai/DeepSeek-V3-0324-bf16/ggml-model-Q2_K.gguf TEMPLATE """{{- range $i, $_ := .Messages }} {{- if eq .Role "user" }}<|User|> {{- else if eq .Role "assistant" }}<|Assistant|> {{- end }}{{ .Content }} {{- if eq (len (slice $.Messages $i)) 1 }} {{- if eq .Role "user" }}<|Assistant|> {{- end }} {{- else if eq .Role "assistant" }}<|end▁of▁sentence|><|begin▁of▁sentence|> {{- end }} {{- end }}""" PARAMETER stop <|begin▁of▁sentence|> PARAMETER stop <|end▁of▁sentence|> PARAMETER stop <|User|> PARAMETER stop <|Assistant|> PARAMETER num_gpu 1 ``` ### Donation If you like it, please click 'like' and follow us for more updates. You can follow [x.com/support_huihui](https://x.com/support_huihui) to get the latest model information from huihui.ai. ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it. - bitcoin: ``` bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge ```