werty1248 commited on
Commit
49f41a2
·
verified ·
1 Parent(s): 26c6fef

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - werty1248/Qwen2.5-32B-s1.1-Ko-Native
4
+ ---
5
+
6
+ ### vllm
7
+
8
+ - For 24GB VRAM
9
+ - max-model-len: <4096 (marlin_awq) - not available
10
+ - max-model-len: 10240 (2048 + 8192) (awq)
11
+
12
+ ```
13
+ vllm serve werty1248/Qwen2.5-32B-s1.1-Ko-Native-AWQ --max-model-len 10240 --quantization awq --dtype half --port 8000 --gpu-memory-utilization 0.99 --enforce_eager
14
+ ```