Update README.md
Browse files
README.md
CHANGED
@@ -117,10 +117,29 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
117 |
print(response)
|
118 |
```
|
119 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
120 |
### vllm
|
|
|
121 |
For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm)
|
122 |
|
123 |
-
```
|
124 |
# install vllm
|
125 |
pip install vllm>=0.6.4.post1
|
126 |
|
@@ -145,7 +164,8 @@ curl http://localhost:8000/v1/completions \
|
|
145 |
### SGLang
|
146 |
|
147 |
You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang)
|
148 |
-
|
|
|
149 |
# install SGLang
|
150 |
pip install "sglang[all]>=0.4.5" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
|
151 |
|
@@ -170,11 +190,15 @@ curl http://localhost:8000/v1/completions \
|
|
170 |
### ollama
|
171 |
|
172 |
You can download ollama using [this](https://ollama.com/download/)
|
|
|
173 |
* quantization: Q4_K_M
|
|
|
174 |
```bash
|
175 |
ollama run zhihu/zhi-writing-dsr1-14b
|
176 |
```
|
|
|
177 |
* bf16
|
|
|
178 |
```bash
|
179 |
ollama run zhihu/zhi-writing-dsr1-14b:bf16
|
180 |
```
|
|
|
117 |
print(response)
|
118 |
```
|
119 |
|
120 |
+
### ZhiLight
|
121 |
+
|
122 |
+
You can easily start a service using [ZhiLight](https://github.com/zhihu/ZhiLight)
|
123 |
+
|
124 |
+
```bash
|
125 |
+
docker run -it --net=host --gpus='"device=0"' -v /path/to/model:/mnt/models --entrypoints="" ghcr.io/zhihu/zhilight/zhilight:0.4.17-cu124 python -m zhilight.server.openai.entrypoints.api_server --model-path /mnt/models --port 8000 --enable-reasoning --reasoning-parser deepseek-r1 --served-model-name Zhi-writing-dsr1-14b
|
126 |
+
|
127 |
+
curl http://localhost:8000/v1/completions \
|
128 |
+
-H "Content-Type: application/json" \
|
129 |
+
-d '{
|
130 |
+
"model": "Zhi-writing-dsr1-14b",
|
131 |
+
"prompt": "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章",
|
132 |
+
"max_tokens": 4096,
|
133 |
+
"temperature": 0.6,
|
134 |
+
"top_p": 0.95
|
135 |
+
}'
|
136 |
+
```
|
137 |
+
|
138 |
### vllm
|
139 |
+
|
140 |
For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm)
|
141 |
|
142 |
+
```bash
|
143 |
# install vllm
|
144 |
pip install vllm>=0.6.4.post1
|
145 |
|
|
|
164 |
### SGLang
|
165 |
|
166 |
You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang)
|
167 |
+
|
168 |
+
```bash
|
169 |
# install SGLang
|
170 |
pip install "sglang[all]>=0.4.5" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
|
171 |
|
|
|
190 |
### ollama
|
191 |
|
192 |
You can download ollama using [this](https://ollama.com/download/)
|
193 |
+
|
194 |
* quantization: Q4_K_M
|
195 |
+
|
196 |
```bash
|
197 |
ollama run zhihu/zhi-writing-dsr1-14b
|
198 |
```
|
199 |
+
|
200 |
* bf16
|
201 |
+
|
202 |
```bash
|
203 |
ollama run zhihu/zhi-writing-dsr1-14b:bf16
|
204 |
```
|