Zhihu-ai
/

Zhi-writing-dsr1-14b-gptq-int4

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

Parkerlambert123 commited on 1 day ago

Commit

56e7588

·

verified ·

1 Parent(s): 7fb75cb

Update README.md

Files changed (1) hide show

README.md +26 -2

README.md CHANGED Viewed

@@ -117,10 +117,29 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
 ```
 ### vllm
 For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm)
-```python
 # install vllm
 pip install vllm>=0.6.4.post1
@@ -145,7 +164,8 @@ curl http://localhost:8000/v1/completions \
 ### SGLang
 You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang)
-```python
 # install SGLang
 pip install "sglang[all]>=0.4.5" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
@@ -170,11 +190,15 @@ curl http://localhost:8000/v1/completions \
 ### ollama
 You can download ollama using [this](https://ollama.com/download/)
 * quantization: Q4_K_M
 ```bash
 ollama run zhihu/zhi-writing-dsr1-14b
 ```
 * bf16
 ```bash
 ollama run zhihu/zhi-writing-dsr1-14b:bf16
 ```

 print(response)
 ```
+### ZhiLight
+You can easily start a service using [ZhiLight](https://github.com/zhihu/ZhiLight)
+```bash
+docker run -it --net=host --gpus='"device=0"' -v /path/to/model:/mnt/models --entrypoints="" ghcr.io/zhihu/zhilight/zhilight:0.4.17-cu124 python -m zhilight.server.openai.entrypoints.api_server --model-path /mnt/models --port 8000 --enable-reasoning --reasoning-parser deepseek-r1 --served-model-name Zhi-writing-dsr1-14b
+curl http://localhost:8000/v1/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "Zhi-writing-dsr1-14b",
+        "prompt": "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章",
+        "max_tokens": 4096,
+        "temperature": 0.6,
+        "top_p": 0.95
+    }'
+```
 ### vllm
 For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm)
+```bash
 # install vllm
 pip install vllm>=0.6.4.post1
 ### SGLang
 You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang)
+```bash
 # install SGLang
 pip install "sglang[all]>=0.4.5" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
 ### ollama
 You can download ollama using [this](https://ollama.com/download/)
 * quantization: Q4_K_M
 ```bash
 ollama run zhihu/zhi-writing-dsr1-14b
 ```
 * bf16
 ```bash
 ollama run zhihu/zhi-writing-dsr1-14b:bf16
 ```