通过OCP 向原有OBProxy 集群添加OBProxy节点,但无法通过新添加的OBProxy 节点访问OB集群

【 使用环境 】生产环境
【 OB or 其他组件 】OBProxy
【 使用版本 】OBProxy 4.3.2.0-42 CE
【问题描述】通过OCP 向原有OBProxy 集群添加OBProxy节点,但无法通过新添加的OBProxy 节点访问OB集群,报错:ERROR 2013 (HY000): Lost connection to MySQL server at ‘reading authorization packet’, system error: 0
【复现路径】问题出现前后相关操作
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):
相关环境说明:
OBServer 版本:OceanBase_CE 4.3.5.1
OCP:4.3.3-20241219140415 CE


新添加的OBProxy 节点无法访问,原有OBProxy 节点可以正常访问

log/obproxy.log 日志中有 WDIAG 日志截取如下

WDIAG [PROXY.SM] set_client_abort (ob_mysql_sm.cpp:8691) [64481][Y0-00007FBB16C02DF0] [lt=0] [dc=0] client will abort soon(sm_id=11, cs_id=11, proxy_sessid=0, ss_id=0, server_sessid=0, client_ip={127.0.0.1:40550}, server_ip={*Not IP address [0]*:0}, cluster_name=, tenant_name=, user_name=, db=, event="VC_EVENT_EOS", request_cmd="Sleep", sql_cmd="Handshake", sql=OB_MYSQL_COM_HANDSHAKE)

相关日志打包如下(部分敏感信息已做脱敏处理):
log_0527.tar.gz (463.2 KB)

请问是否是哪些地方没有配置到位

set_client_abort,指上游组件发生断连事件非ODP或OB导致。

登录ODP查询下show proxyconfig like ‘%root%’;
麻烦查一下rootservice_list指定的observer是否正常可用

通过原先的ODP 可以正常连上 observer, 新加的连不上;observer 确认正常可用
查询结果如下, 两个节点的查询结果均一致:

MySQL [oceanbase]> show proxyconfig like '%root%'\G
*************************** 1. row ***************************
         name: rootservice_list
        value: 
         info: a list of servers against which election candidate is checked for validation, format ip1:sql_port1;ip2:sql_port2
  need_reboot: true
visible_level: SYS
        range: 
 config_level: LEVEL_GLOBAL
*************************** 2. row ***************************
         name: rootservice_cluster_name
        value: undefined
         info: default cluster name for rootservice_list
  need_reboot: true
visible_level: SYS
        range: 
 config_level: LEVEL_VIP
2 rows in set (0.00 sec)

新添加的ODP节点的网络和端口是否都是打通的?
检查下ob的白名单

SHOW PARAMETERS LIKE 'ob_tcp_invited_nodes';

新添加的ODP节点的网络和端口应该没问题

MySQL [oceanbase]> SHOW PARAMETERS LIKE 'ob_tcp_invited_nodes';
Empty set (0.11 sec)

需要检查 proxyro@sys 用户登录这三个 observer 是否有问题
47.xx.xx.106:2881
218.xx.xx.149:2881
47.xx.xx.107:2881

[2025-05-27 13:43:06.597189] INFO  [PROXY.CS] check_update_ldc (ob_mysql_client_session.cpp:1844) [64481][Y0-00007FBB16C02DF0] [lt=0] [dc=0] base servers has not added, treat all tenant server as ok(tenant_server={this:0x7fbb16af72f0, is_inited:true, server_count:3, replica_count:3, partition_count:1, next_partition_idx:0, server_array:0x7fbb16aff2a0, server_array_:[[0]{server:"47.xx.xx.106:2881", rpc_server:"47.xx.xx.106:2882", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, [1]{server:"218.xx.xx.149:2881", rpc_server:"218.xx.xx.149:2882", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, [2]{server:"47.xx.xx.107:2881", rpc_server:"47.xx.xx.107:2882", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}]}, ret=0)
[2025-05-27 13:43:11.526997] INFO  [PROXY] main_handler (ob_resource_pool_processor.cpp:1261) [64481][Y0-00007FBB16C02DF0] [lt=0] [dc=0] ObClusterResourceCreateCont::main_handler(event="CLUSTER_RESOURCE_CREATE_TIMEOUT_EVENT", init_status=2, cluster_name=sgoceanbase, cluster_id=0, data=0x7fbb1e9baab0)
[2025-05-27 13:43:11.527038] INFO  [PROXY] handle_timeout (ob_resource_pool_processor.cpp:1042) [64481][Y0-00007FBB16C02DF0] [lt=0] [dc=0] handle timeout(created_cr=0x7fbb27db3880, is_rslist_from_local=false)
1 个赞

感谢老师指点
发现有一个登录不上:218.xx.xx.149:2881
sys 租户可以通过218.xx.xx.149:2881正常登录, 业务租户通过该节点登录则是报ERROR 1227 (42501): Access denied 错误

使用proxyro@sys用户可以登录149节点么

之前登录不上,现在可以了, 之前是内网连不上149节点的2881端口, 配置好该端口后, 新ODP节点可以正常使用了;
业务租户的 ob_tcp_invited_nodes 包含了149 节点,但是通过 149:2881登录会报Access denied 错误,把该配置参数设置为 % 后才可以通过该节点登录

创建集群时,登录超时,这种错误无法重试