使用sysbench测试OceanBase集群相比单机性能不增反减是为什么

【 使用环境 】借用生产环境做测试
【 OB or 其他组件 】

测试机信息

32核心,64G内存

单机环境

系统配置:32核,64G内存,200G磁盘程序日志数据磁盘共用
测试租户:26核,45G内存
程序版本:ob 4.2.1.1
测试方式:直连ob使用2881端口

集群环境

系统配置:6台机器配置:48核 384G内存 系统盘300G,日志盘1.2T,数据盘2T
测试租户:Unit规格36核200G内存 * 2,3 zone


程序版本:ob 4.2.1.0
测试方式:连接obproxy 使用2883端口
集群主机信息:

【问题描述】使用sysbench测试OceanBase集群相比单机性能没有明显提示,甚至出现降低,是我的测试方式有什么问题吗,有什么优化方式,能让集群性能是实实在在的比单机强?
【复现路径】

测试规格

--mysql-db=sbtest 
--table_size=1000000 
--tables=10 
--threads=1000
--report-interval=10 
--time=60
--db-ps-mode=disable
--rand-type=uniform

oltp_read_only

集群ob,QPS为184600

[root@mysqltest ~]# sysbench /usr/share/sysbench/oltp_read_only.lua --mysql-host=xx.xx.xx.33 --mysql-port=2883 --mysql-db=sbtest --mysql-user=root@xx#xxx --mysql-password=xxxxxx --table_size=1000000 --tables=10 --threads=1000 --report-interval=10 --rand-type=uniform --db-ps-mode=disable --time=60 run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1000
Report intermediate results every 10 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 10s ] thds: 1000 tps: 7503.66 qps: 120812.73 (r/w/o: 105705.44/0.00/15107.29) lat (ms,95%): 272.27 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 1000 tps: 13884.55 qps: 222171.59 (r/w/o: 194403.69/0.00/27767.90) lat (ms,95%): 147.61 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 1000 tps: 14682.57 qps: 234942.99 (r/w/o: 205577.15/0.00/29365.84) lat (ms,95%): 139.85 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 1000 tps: 15657.86 qps: 250544.69 (r/w/o: 219229.67/0.00/31315.02) lat (ms,95%): 130.13 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: 1000 tps: 12014.31 qps: 192250.00 (r/w/o: 168220.17/0.00/24029.82) lat (ms,95%): 176.73 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 1000 tps: 6340.40 qps: 101440.74 (r/w/o: 88759.94/0.00/12680.81) lat (ms,95%): 580.02 err/s: 0.00 reconn/s: 0.00
SQL statistics:
    queries performed:
        read:                            9825844
        write:                           0
        other:                           1403692
        total:                           11229536
    transactions:                        701846 (11537.50 per sec.)
    queries:                             11229536 (184600.03 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          60.8298s
    total number of events:              701846

Latency (ms):
         min:                                    5.04
         avg:                                   86.33
         max:                                 6209.30
         95th percentile:                      186.54
         sum:                             60588514.38

Threads fairness:
    events (avg/stddev):           701.8460/22.09
    execution time (avg/stddev):   60.5885/0.10

单机ob ,QPS为183153,比集群略低,但是集群没有明显优势

[root@mysqltest ~]# sysbench /usr/share/sysbench/oltp_read_only.lua --mysql-host=xx.xx.xx.53 --mysql-port=2881 --mysql-db=sbtest --mysql-user=root#xx --mysql-password=xxxxxx --table_size=1000000 --tables=10 --threads=1000 --report-interval=10 --rand-type=uniform --db-ps-mode=disable --time=60 run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1000
Report intermediate results every 10 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 10s ] thds: 1000 tps: 10558.54 qps: 169706.74 (r/w/o: 148495.90/0.00/21210.84) lat (ms,95%): 147.61 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 1000 tps: 11399.25 qps: 182388.97 (r/w/o: 159589.97/0.00/22799.00) lat (ms,95%): 127.81 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 1000 tps: 11683.47 qps: 186937.30 (r/w/o: 163571.06/0.00/23366.24) lat (ms,95%): 125.52 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 1000 tps: 11931.91 qps: 190942.36 (r/w/o: 167076.34/0.00/23866.02) lat (ms,95%): 123.28 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: 1000 tps: 11546.53 qps: 184664.58 (r/w/o: 161574.02/0.00/23090.56) lat (ms,95%): 132.49 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 1000 tps: 11639.36 qps: 186288.23 (r/w/o: 163010.00/0.00/23278.23) lat (ms,95%): 125.52 err/s: 0.00 reconn/s: 0.00
SQL statistics:
    queries performed:
        read:                            9640568
        write:                           0
        other:                           1377224
        total:                           11017792
    transactions:                        688612 (11447.11 per sec.)
    queries:                             11017792 (183153.79 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          60.1540s
    total number of events:              688612

Latency (ms):
         min:                                    4.21
         avg:                                   87.17
         max:                                  600.04
         95th percentile:                      130.13
         sum:                             60027854.99

Threads fairness:
    events (avg/stddev):           688.6120/32.15
    execution time (avg/stddev):   60.0279/0.02

oltp_write_only

集群ob QPS为69691

[root@mysqltest ~]# sysbench /usr/share/sysbench/oltp_write_only.lua --mysql-host=xx.xx.xx.33 --mysql-port=2883 --mysql-db=sbtest --mysql-user=root@xx#xxx --mysql-password=xxxxxx --table_size=1000000 --tables=10 --threads=1000 --report-interval=10 --rand-type=uniform --db-ps-mode=disable --time=60 run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1000
Report intermediate results every 10 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 10s ] thds: 1000 tps: 12705.83 qps: 76679.75 (r/w/o: 0.00/51170.43/25509.33) lat (ms,95%): 183.21 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 1000 tps: 13627.66 qps: 81745.49 (r/w/o: 0.00/54487.86/27257.63) lat (ms,95%): 196.89 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 1000 tps: 13834.52 qps: 82863.83 (r/w/o: 0.00/55194.78/27669.04) lat (ms,95%): 150.29 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 1000 tps: 9523.14 qps: 57307.34 (r/w/o: 0.00/38261.46/19045.88) lat (ms,95%): 308.84 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: 1000 tps: 11836.36 qps: 70969.18 (r/w/o: 0.00/47296.96/23672.23) lat (ms,95%): 240.02 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 1000 tps: 8228.67 qps: 49351.15 (r/w/o: 0.00/32892.90/16458.25) lat (ms,95%): 350.33 err/s: 0.00 reconn/s: 0.00
SQL statistics:
    queries performed:
        read:                            0
        write:                           2794352
        other:                           1397176
        total:                           4191528
    transactions:                        698588 (11615.25 per sec.)
    queries:                             4191528 (69691.48 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          60.1420s
    total number of events:              698588

Latency (ms):
         min:                                    6.89
         avg:                                   85.93
         max:                                 1252.89
         95th percentile:                      231.53
         sum:                             60030402.01

Threads fairness:
    events (avg/stddev):           698.5880/7.15
    execution time (avg/stddev):   60.0304/0.02

单机ob QPS为136988,远高于集群

[root@mysqltest ~]# sysbench /usr/share/sysbench/oltp_write_only.lua --mysql-host=xx.xx.xx.53 --mysql-port=2881 --mysql-db=sbtest --mysql-user=root#xx --mysql-password=xxxxxx --table_size=1000000 --tables=10 --threads=1000 --report-interval=10 --rand-type=uniform --db-ps-mode=disable --time=60 run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1000
Report intermediate results every 10 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 10s ] thds: 1000 tps: 23155.01 qps: 139333.94 (r/w/o: 0.00/92923.95/46409.99) lat (ms,95%): 68.05 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 1000 tps: 24285.00 qps: 145696.10 (r/w/o: 0.00/97126.30/48569.80) lat (ms,95%): 62.19 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 1000 tps: 23514.38 qps: 140995.56 (r/w/o: 0.00/93970.70/47024.85) lat (ms,95%): 74.46 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 1000 tps: 23734.71 qps: 142608.86 (r/w/o: 0.00/95135.34/47473.52) lat (ms,95%): 74.46 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: 1000 tps: 24459.11 qps: 146437.86 (r/w/o: 0.00/97545.04/48892.82) lat (ms,95%): 69.29 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 1000 tps: 18072.70 qps: 108522.31 (r/w/o: 0.00/72370.51/36151.80) lat (ms,95%): 92.42 err/s: 0.00 reconn/s: 0.00
SQL statistics:
    queries performed:
        read:                            0
        write:                           5493040
        other:                           2746520
        total:                           8239560
    transactions:                        1373260 (22831.37 per sec.)
    queries:                             8239560 (136988.22 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          60.1460s
    total number of events:              1373260

Latency (ms):
         min:                                    8.94
         avg:                                   43.70
         max:                                  293.05
         95th percentile:                       74.46
         sum:                             60017181.24

Threads fairness:
    events (avg/stddev):           1373.2600/21.98
    execution time (avg/stddev):   60.0172/0.01

【问题现象及影响】

【附件】

1 个赞
  1. 表的数量和表的数据量都可以再大一些,方配得上这 壕的配置。
  2. 测试时间也可以长一些。
  3. 起三个 sysbench会话,分别连接三个 obproxy。
  4. 三节点的租户的PRIMARY_ZONE 设置为 RANDOM。
  5. sysbench 参数 --skip_trx=on 或 off 都可以试试。

性能调优参考: OceanBase Sysbench 高性能部署和问题分析-OceanBase 数据库 -OceanBase文档中心-分布式数据库使用文档

1 个赞

或许可以检查下集群节点之间的通信,如果集群之间网络不好,有可能导致集群性能不如单节点

在跑sysbench之前也可以用巡检工具的sysbench_free的模式看下是否可能是系统/集群配置存在不太良好的情况