Example for evaluating the DeepSeek-V3-0324 API performance
#33
by
wangxingjun778
- opened
An example for evaluating the DeepSeek-V3-0324 API service
This feature is especially useful in performance evaluation scenarios for independently deployed model services or other v3 distilled versions of model services.
Reference:
https://github.com/modelscope/evalscope
https://evalscope.readthedocs.io/en/latest/user_guides/stress_test/quick_start.html
Installation:
pip install evalscope[perf] -U
from evalscope.perf.main import run_perf_benchmark
if __name__ == '__main__':
# DeepSek API key
# If you deploy the DeepSeek-V3-0324 on your own server, then ignore this key
ds_api_key = 'sk-xxx-xxx'
task_cfg = {"url": "https://api.deepseek.com/v1/chat/completions",
"api_key": ds_api_key, # DeepSeek API key, it's not required if you deploy the DeepSeek-V3-0324 on your own server
"parallel": 10, # number of parallel requests
"model": "deepseek-chat",
"number": 50, # number of requests
"api": "openai",
"dataset": "openqa",
"stream": True,
# "wandb_api_key": "xxx-xxx", # If wandb is required
}
run_perf_benchmark(task_cfg)
Token usage: approximately 30k tokens consumed for 50 requests
Benchmarking summary:
+-----------------------------------+-----------------------------------------------------------+
| Key | Value |
+===================================+===========================================================+
| Time taken for tests (s) | 186.815 |
+-----------------------------------+-----------------------------------------------------------+
| Number of concurrency | 10 |
+-----------------------------------+-----------------------------------------------------------+
| Total requests | 50 |
+-----------------------------------+-----------------------------------------------------------+
| Succeed requests | 50 |
+-----------------------------------+-----------------------------------------------------------+
| Failed requests | 0 |
+-----------------------------------+-----------------------------------------------------------+
| Throughput(average tokens/s) | 139.389 |
+-----------------------------------+-----------------------------------------------------------+
| Average QPS | 0.268 |
+-----------------------------------+-----------------------------------------------------------+
| Average latency (s) | 34.556 |
+-----------------------------------+-----------------------------------------------------------+
| Average time to first token (s) | 14.912 |
+-----------------------------------+-----------------------------------------------------------+
| Average time per output token (s) | 0.072 |
+-----------------------------------+-----------------------------------------------------------+
| Average input tokens per request | 23.06 |
+-----------------------------------+-----------------------------------------------------------+
| Average output tokens per request | 520.8 |
+-----------------------------------+-----------------------------------------------------------+
| Average package latency (s) | 0.038 |
+-----------------------------------+-----------------------------------------------------------+
| Average package per request | 519.78 |
+-----------------------------------+-----------------------------------------------------------+
| Expected number of requests | 50 |
+-----------------------------------+-----------------------------------------------------------+
| Result DB path | ./outputs/20250326_123713/deepseek-chat/benchmark_data.db |
+-----------------------------------+-----------------------------------------------------------+
Percentile results:
+------------+----------+----------+-------------+--------------+---------------+----------------------+
| Percentile | TTFT (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Throughput(tokens/s) |
+------------+----------+----------+-------------+--------------+---------------+----------------------+
| 10% | 3.4742 | 0.0 | 19.2338 | 14 | 259 | 7.7374 |
| 25% | 3.6814 | 0.0 | 22.4804 | 18 | 454 | 12.4354 |
| 50% | 5.1942 | 0.0563 | 27.4725 | 23 | 525 | 21.0367 |
| 66% | 9.006 | 0.0612 | 30.4141 | 26 | 566 | 22.1485 |
| 75% | 11.5398 | 0.0624 | 43.9004 | 27 | 640 | 23.0458 |
| 80% | 27.6334 | 0.0634 | 52.27 | 31 | 687 | 23.6128 |
| 90% | 50.5659 | 0.067 | 70.3294 | 33 | 710 | 25.0699 |
| 95% | 56.786 | 0.0792 | 76.4323 | 37 | 737 | 25.5895 |
| 98% | 69.6539 | 0.0982 | 92.403 | 39 | 1008 | 26.1793 |
| 99% | 69.6539 | 0.1263 | 92.403 | 39 | 1008 | 26.1793 |
+------------+----------+----------+-------------+--------------+---------------+----------------------+