【使用Hey对vllm接口压测】模型并发能力

使用Hey对vllm进行模型并发压测

在这里插入图片描述

docker run --rm --network=knowledge_network \registry.cn-shanghai.aliyuncs.com/zhph-server/hey:latest \-n 200 -c 200 -m POST -H "Content-Type: application/json" \-H "Authorization: xxx" \-d '{"model": "codechat","messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello!"}],"stream": false,"max_tokens": 100,"temperature": 0.0}' http://vllm-openai:80/v1/chat/completions

docker run --rm --network=knowledge_network \registry.cn-shanghai.aliyuncs.com/zhph-server/hey:latest \-n 200 -c 200 -m POST -H "Content-Type: application/json" \-H "Authorization: xxx" \-d '{"model": "codebase","prompt": "# write a python code to print hello world","stream": false,"max_tokens": 100,"temperature": 0.5}' http://vllm-openai:80/v1/completions

结果

Summary:                                                                                                                                                                 Total:        2.2220 secs                                                                                                                                              Slowest:      1.3603 secs                                                                                                                                              Fastest:      0.7641 secs                                                                                                                                              Average:      1.0815 secs                                                                                                                                              Requests/sec: 43.2034                                                                                                                                                  Total data:   28992 bytes                                                                                                                                              Size/request: 302 bytes                                                                                                                                                Response time histogram:                                                                                                                                                 0.764 [1]     |■                                                                                                                                                       0.824 [5]     |■■■■■■■                                                                                                                                                 0.883 [4]     |■■■■■■                                                                                                                                                  0.943 [7]     |■■■■■■■■■■                                                                                                                                              1.003 [11]    |■■■■■■■■■■■■■■■■                                                                                                                                        1.062 [7]     |■■■■■■■■■■                                                                                                                                              1.122 [28]    |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■                                                                                                                1.181 [7]     |■■■■■■■■■■                                                                                                                                              1.241 [9]     |■■■■■■■■■■■■■                                                                                                                                           1.301 [9]     |■■■■■■■■■■■■■                                                                                                                                           1.360 [8]     |■■■■■■■■■■■                                                                                                                                             Latency distribution:                                                                                                                                                    10% in 0.9175 secs                                                                                                                                                     25% in 0.9570 secs                                                                                                                                                     50% in 1.0721 secs                                                                                                                                                     75% in 1.2131 secs                                                                                                                                                     90% in 1.2790 secs                                                                                                                                                     95% in 1.3599 secs                                                                                                                                                     0% in 0.0000 secs                                                                                                                                                      Details (average, fastest, slowest):                                                                                                                                     DNS+dialup:   0.0036 secs, 0.7641 secs, 1.3603 secs                                                                                                                    DNS-lookup:   0.0013 secs, 0.0000 secs, 0.0075 secs                                                                                                                    req write:    0.0003 secs, 0.0000 secs, 0.0051 secs                                                                                                                    resp wait:    1.0774 secs, 0.7640 secs, 1.3533 secs                                                                                                                    resp read:    0.0001 secs, 0.0000 secs, 0.0002 secs                                                                                                                    Status code distribution:                                                                                                                                                [200] 96 responses