Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model:
|
3 |
+
- werty1248/Qwen2.5-32B-s1.1-Ko-Native
|
4 |
+
---
|
5 |
+
|
6 |
+
### vllm
|
7 |
+
|
8 |
+
- For 24GB VRAM
|
9 |
+
- max-model-len: <4096 (marlin_awq) - not available
|
10 |
+
- max-model-len: 10240 (2048 + 8192) (awq)
|
11 |
+
|
12 |
+
```
|
13 |
+
vllm serve werty1248/Qwen2.5-32B-s1.1-Ko-Native-AWQ --max-model-len 10240 --quantization awq --dtype half --port 8000 --gpu-memory-utilization 0.99 --enforce_eager
|
14 |
+
```
|