An example for evaluating the DeepSeek-V3-0324 API service

This feature is especially useful in performance evaluation scenarios for independently deployed model services or other v3 distilled versions of model services.

Reference:

evalscope

https://evalscope.readthedocs.io/en/latest/user_guides/stress_test/quick_start.html

Installation:

pip install evalscope[perf] -U

from evalscope.perf.main import run_perf_benchmark

if __name__ == '__main__':

    # DeepSek API key
    # If you deploy the DeepSeek-V3-0324 on your own server, then ignore this key
    ds_api_key = 'sk-xxx-xxx'

    task_cfg = {"url": "https://api.deepseek.com/v1/chat/completions",
                "api_key": ds_api_key,  # DeepSeek API key, it's not required if you deploy the DeepSeek-V3-0324 on your own server
                "parallel": 10,         # number of parallel requests
                "model": "deepseek-chat",
                "number": 50,  # number of requests
                "api": "openai",
                "dataset": "openqa",
                "stream": True,
                # "wandb_api_key": "xxx-xxx",   # If wandb is required
                }

    run_perf_benchmark(task_cfg)

Token usage: approximately 30k tokens consumed for 50 requests

Benchmarking summary:
+-----------------------------------+-----------------------------------------------------------+
| Key                               | Value                                                     |
+===================================+===========================================================+
| Time taken for tests (s)          | 186.815                                                   |
+-----------------------------------+-----------------------------------------------------------+
| Number of concurrency             | 10                                                        |
+-----------------------------------+-----------------------------------------------------------+
| Total requests                    | 50                                                        |
+-----------------------------------+-----------------------------------------------------------+
| Succeed requests                  | 50                                                        |
+-----------------------------------+-----------------------------------------------------------+
| Failed requests                   | 0                                                         |
+-----------------------------------+-----------------------------------------------------------+
| Throughput(average tokens/s)      | 139.389                                                   |
+-----------------------------------+-----------------------------------------------------------+
| Average QPS                       | 0.268                                                     |
+-----------------------------------+-----------------------------------------------------------+
| Average latency (s)               | 34.556                                                    |
+-----------------------------------+-----------------------------------------------------------+
| Average time to first token (s)   | 14.912                                                    |
+-----------------------------------+-----------------------------------------------------------+
| Average time per output token (s) | 0.072                                                     |
+-----------------------------------+-----------------------------------------------------------+
| Average input tokens per request  | 23.06                                                     |
+-----------------------------------+-----------------------------------------------------------+
| Average output tokens per request | 520.8                                                     |
+-----------------------------------+-----------------------------------------------------------+
| Average package latency (s)       | 0.038                                                     |
+-----------------------------------+-----------------------------------------------------------+
| Average package per request       | 519.78                                                    |
+-----------------------------------+-----------------------------------------------------------+
| Expected number of requests       | 50                                                        |
+-----------------------------------+-----------------------------------------------------------+
| Result DB path                    | ./outputs/20250326_123713/deepseek-chat/benchmark_data.db |
+-----------------------------------+-----------------------------------------------------------+

Percentile results:
+------------+----------+----------+-------------+--------------+---------------+----------------------+
| Percentile | TTFT (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Throughput(tokens/s) |
+------------+----------+----------+-------------+--------------+---------------+----------------------+
|    10%     |  3.4742  |   0.0    |   19.2338   |      14      |      259      |        7.7374        |
|    25%     |  3.6814  |   0.0    |   22.4804   |      18      |      454      |       12.4354        |
|    50%     |  5.1942  |  0.0563  |   27.4725   |      23      |      525      |       21.0367        |
|    66%     |  9.006   |  0.0612  |   30.4141   |      26      |      566      |       22.1485        |
|    75%     | 11.5398  |  0.0624  |   43.9004   |      27      |      640      |       23.0458        |
|    80%     | 27.6334  |  0.0634  |    52.27    |      31      |      687      |       23.6128        |
|    90%     | 50.5659  |  0.067   |   70.3294   |      33      |      710      |       25.0699        |
|    95%     |  56.786  |  0.0792  |   76.4323   |      37      |      737      |       25.5895        |
|    98%     | 69.6539  |  0.0982  |   92.403    |      39      |     1008      |       26.1793        |
|    99%     | 69.6539  |  0.1263  |   92.403    |      39      |     1008      |       26.1793        |
+------------+----------+----------+-------------+--------------+---------------+----------------------+

deepseek-ai
/

DeepSeek-V3-0324

Example for evaluating the DeepSeek-V3-0324 API performance

https://github.com/modelscope/evalscope

https://evalscope.readthedocs.io/en/latest/user_guides/stress_test/quick_start.html

Token usage: approximately 30k tokens consumed for 50 requests