OB高可用验证,OBProxy高可用疑问

OB高可用验证,OBProxy高可用容灾异常

环境信息

  • observer版本:
obclient -u root -p -h 127.0.0.1 -P 2883
Enter password:
Welcome to the OceanBase.  Commands end with ; or \g.
Your OceanBase connection id is 1998
Server version: 5.6.25 OceanBase 3.1.4 (r10000092022071511-b4bfa011ceaef428782dcb65ae89190c40b78c2f) (Built Jul 15 2022 11:45:12)

Copyright (c) 2000, 2022, OceanBase and/or its affiliates. All rights reserved.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

obclient [(none)]> select version();
+--------------------+
| version()          |
+--------------------+
| 3.1.4-OceanBase CE |
+--------------------+
1 row in set (0.004 sec)
  • obproxy版本:
/root/obproxy/bin/obproxy --version
obproxy (OceanBase 3.2.3 2.el8)
REVISION: 6-local-99faebfc7130b70ad0f56330a28cab6a32ec9a33
BUILD_TIME: Mar 30 2022 01:53:20
  • 集群架构
    1-1-1 三节点+OBProxy

测试步骤

  1. sysbench持续压测
time sysbench --test=./oltp_read_only.lua --mysql-host=127.0.0.1 --mysql-port=2883 --mysql-db=sysbenchdb --mysql-user="ob_cluster:tenant_2:u_sysbench" --mysql-password=123456 --tables=16 --table_size=5000000 --threads=32 --time=3000 --report-interval=1 --db-driver=mysql --db-ps-mode=disable --skip-trx=on --mysql-ignore-errors=6002,6004,4012,2013,4016,1062 run

观察到三个节点都有流量

2.kill 启动一台OB进程

ps -ef|grep observer
root      783095       1 99 10:21 ?        00:07:28 /mydata/observer1/bin/observer -r 10.140.114.12:2882:2881;10.140.60.14:2882:2881;10.140.118.7:2882:2881 -o __min_full_resource_pool_memory=268435456,memory_limit=14G,system_memory=4G,stack_size=512K,cpu_count=16,cache_wash_threshold=1G,workers_per_cpu_quota=10,schema_history_expire_time=1d,net_thread_count=4,major_freeze_duty_time=Disable,minor_freeze_times=10,enable_separate_sys_clog=0,enable_merge_by_turn=False,datafile_disk_percentage=20,enable_syslog_recycle=True,max_syslog_file_count=4 -z zone2 -p 2881 -P 2882 -n ob_cluster -c 1 -d /mydata/observer1/store -i eth0 -l DEBUG
root      783936  779377  0 10:25 pts/0    00:00:00 grep --color=auto observer
[root@dhy02 ~]# kill -9 783095

3.观察结果

  • Leader全部切换到另外两个节点上
obclient [oceanbase]> SELECT tenant.tenant_name,        meta.table_id,        tab.table_name,        partition_id,        ZONE,        concat(svr_ip, ':', svr_port) observer ,        CASE            WHEN ROLE=1 THEN 'leader'            WHEN ROLE=2 THEN 'follower'            ELSE NULL        END AS ROLE,        tab.primary_zone FROM __all_virtual_meta_table meta INNER JOIN __all_tenant tenant ON meta.tenant_id=tenant.tenant_id INNER JOIN __all_virtual_table tab ON meta.tenant_id=tab.tenant_id AND meta.table_id=tab.table_id WHERE tenant.tenant_id='1001' and role = 1 ORDER BY tenant.tenant_name,          TABLE_NAME,          partition_id,          ZONE;
+-------------+------------------+------------+--------------+-------+--------------------+--------+--------------+
| tenant_name | table_id         | table_name | partition_id | ZONE  | observer           | ROLE   | primary_zone |
+-------------+------------------+------------+--------------+-------+--------------------+--------+--------------+
| tenant_2    | 1100611139453890 | sbtest1    |            0 | zone3 | 10.140.118.7:2882  | leader |              |
| tenant_2    | 1100611139453900 | sbtest10   |            0 | zone3 | 10.140.118.7:2882  | leader |              |
| tenant_2    | 1100611139453915 | sbtest11   |            0 | zone1 | 10.140.114.12:2882 | leader |              |
| tenant_2    | 1100611139453904 | sbtest12   |            0 | zone3 | 10.140.118.7:2882  | leader |              |
| tenant_2    | 1100611139453917 | sbtest13   |            0 | zone3 | 10.140.118.7:2882  | leader |              |
| tenant_2    | 1100611139453906 | sbtest14   |            0 | zone1 | 10.140.114.12:2882 | leader |              |
| tenant_2    | 1100611139453919 | sbtest15   |            0 | zone1 | 10.140.114.12:2882 | leader |              |
| tenant_2    | 1100611139453910 | sbtest16   |            0 | zone3 | 10.140.118.7:2882  | leader |              |
| tenant_2    | 1100611139453889 | sbtest2    |            0 | zone3 | 10.140.118.7:2882  | leader |              |
| tenant_2    | 1100611139453896 | sbtest3    |            0 | zone3 | 10.140.118.7:2882  | leader |              |
| tenant_2    | 1100611139453892 | sbtest4    |            0 | zone3 | 10.140.118.7:2882  | leader |              |
| tenant_2    | 1100611139453902 | sbtest5    |            0 | zone3 | 10.140.118.7:2882  | leader |              |
| tenant_2    | 1100611139453894 | sbtest6    |            0 | zone3 | 10.140.118.7:2882  | leader |              |
| tenant_2    | 1100611139453908 | sbtest7    |            0 | zone1 | 10.140.114.12:2882 | leader |              |
| tenant_2    | 1100611139453898 | sbtest8    |            0 | zone3 | 10.140.118.7:2882  | leader |              |
| tenant_2    | 1100611139453913 | sbtest9    |            0 | zone1 | 10.140.114.12:2882 | leader |              |
+-------------+------------------+------------+--------------+-------+--------------------+--------+--------------+
16 rows in set (0.060 sec)

从现象上并没有能做到当后端ob节点发生切换后,OBProxy能拉取最新的信息,保持会连连接不中断。

1 个赞

可以使用最新版本(3.2.3.5)的obproxy进行测试,修复了 Server 挂掉, OBProxy 依然认为建联成功的问题。

https://github.com/oceanbase/obproxy/releases

用了最新版本,复测后还是有同样的问题:

/root/obproxy/bin/obproxy --version
/root/obproxy/bin/obproxy --version
obproxy (OceanBase 3.2.3.5 2.el8)
REVISION: 1-local-4cc2f2e1f696a76e0b5831f6e88e76e0a6831255
BUILD_TIME: Sep  5 2022 19:48:50

麻烦提供一下:
1、使用obproxy 3.2.3.5版本,在kill observer后的obproxy.log日志附件。
2、kill时间点之后2分钟内 正常的节点上 observer.log日志

有点大,我传到网盘上了,从昨天17:56分往后看,是操作后的日志
链接: https://pan.baidu.com/s/19QakFkkPeelM2tmEBOTGAw 提取码: ty3e

同场景,我们在企业版上好像没有测试出这个现象。

不过我们不是使用sysbench。是真实业务场景接入,测试过kill observer,5个zone,每个zone是6个节点。会掉60%左右,十多秒又恢复了

您那边机器是什么配置?

64C,512G