OB建立主从错误

【 使用环境 】测试环境
【 OB or 其他组件 】observer
【 使用版本 】4.3.1.0
【问题描述】使用BACKUP DATABASE PLUS ARCHIVELOG 创建备租户,设置日志还原点,命令: ALTER SYSTEM SET LOG_RESTORE_SOURCE = ‘SERVICE=X.X.X.X:2881;X.X.X.X:2881 USER=rep_user@tenant PASSWORD=123456’ 后报错:ERROR 1210 (HY000): Incorrect arguments to get primary connection, all servers are unreachable
【复现路径】1-1-1架构,由于一个zone节点损坏,所以真正服务的就两个zone,telnet 2882端口连接上后等一会会自动退出,不确定什么原因
【附件及日志】

[2024-10-12 15:53:40.560920] WDIAG [RPC] send (ob_poc_rpc_proxy.h:173) [10248][T1004_L0_G0][T1004][YB420B7BF3C7-00062431EDB7E770-0-0] [lt=33][errcode=-4002] execute rpc fail(addr=“X.X.X.X:2882”, pcode=648, ret=-4002, timeout=24999852)
[2024-10-12 15:53:40.560939] WDIAG log_user_error_and_warn (ob_poc_rpc_proxy.cpp:244) [10248][T1004_L0_G0][T1004][YB420B7BF3C7-00062431EDB7E770-0-0] [lt=17][errcode=-4002] Incorrect arguments to get primary connection, all servers are unreachable
[2024-10-12 15:53:40.560948] WDIAG [SQL.ENG] execute (ob_alter_system_executor.cpp:1241) [10248][T1004_L0_G0][T1004][YB420B7BF3C7-00062431EDB7E770-0-0] [lt=4][errcode=-4002] set config rpc failed(ret=-4002, rpc_arg={items:[{name:“log_restore_source”, value:“SERVICE=X.X.X.X:2881;X.X.X.X:2881 USER=rep_user@tenant PASSWORD=123456”, comment:"", zone:"", server:“0.0.0.0:0”, tenant_name:"", exec_tenant_id:1004, tenant_ids:[], want_to_set_tenant_config:false}], is_inner:false})
[2024-10-12 15:53:40.560994] INFO [SHARE] add_event (ob_event_history_table_operator.h:261) [10248][T1004_L0_G0][T1004][YB420B7BF3C7-00062431EDB7E770-0-0] [lt=17] event table add task(ret=0, event_table_name="_all_server_event_history", sql=INSERT INTO all_server_event_history (gmt_create, module, event, name1, value1, name2, value2, name3, value3, name4, value4, value5, value6, svr_ip, svr_port) VALUES (usec_to_time(1728719620560966), ‘sql’, ‘execute_cmd’, ‘cmd_type’, 177, ‘sql_text’, X’2A2A2A’, ‘return_code’, -4002, ‘tenant_id’, 1004, ‘’, ‘’, ‘X.X.X.X’, 2882))
[2024-10-12 15:53:40.561007] WDIAG [SQL] open_cmd (ob_result_set.cpp:102) [10248][T1004_L0_G0][T1004][YB420B7BF3C7-00062431EDB7E770-0-0] [lt=12][errcode=-4002] execute cmd failed(ret=-4002)
[2024-10-12 15:53:40.561013] WDIAG [SQL] open (ob_result_set.cpp:161) [10248][T1004_L0_G0][T1004][YB420B7BF3C7-00062431EDB7E770-0-0] [lt=5][errcode=-4002] execute plan failed(ret=-4002)
[2024-10-12 15:53:40.561019] WDIAG [SERVER] response_result (ob_sync_cmd_driver.cpp:143) [10248][T1004_L0_G0][T1004][YB420B7BF3C7-00062431EDB7E770-0-0] [lt=4][errcode=-4002] close result set fail(cret=-4002)
[2024-10-12 15:53:40.561024] WDIAG [SERVER] after_func (ob_query_retry_ctrl.cpp:986) [10248][T1004_L0_G0][T1004][YB420B7BF3C7-00062431EDB7E770-0-0] [lt=4][errcode=-4002] [RETRY] check if need retry(v={force_local_retry:false, stmt_retry_times:0, local_retry_times:0, err
:-4002, err
:“OB_INVALID_ARGUMENT”, retry_type:0, client_ret:-4002}, need_retry=false)
[2024-10-12 15:53:40.561038] WDIAG [SERVER] response_result (ob_sync_cmd_driver.cpp:149) [10248][T1004_L0_G0][T1004][YB420B7BF3C7-00062431EDB7E770-0-0] [lt=14][errcode=-4002] result set open failed, check if need retry(ret=-4002, cli_ret=-4002, retry_ctrl
.need_retry()=0)
[2024-10-12 15:53:40.561052] INFO [SERVER] send_error_packet (obmp_packet_sender.cpp:378) [10248][T1004_L0_G0][T1004][YB420B7BF3C7-00062431EDB7E770-0-0] [lt=6] sending error packet(ob_error=-4002, client error=1210, extra_err_info=NULL, lbt()=“0x17a30451 0xb6da170 0xb681d45 0x5ea8f77 0x5c37bcb 0x5c2b6ce 0x18b90701 0x5c24fb0 0x5c1f72a 0xb3c3824 0x17b1814e 0x7f662da081cf 0x7f662d639dd3”)
[2024-10-12 15:53:40.561083] WDIAG [SERVER] do_process (obmp_query.cpp:818) [10248][T1004_L0_G0][T1004][YB420B7BF3C7-00062431EDB7E770-0-0] [lt=11][errcode=-4002] execute query fail(ret=-4002, timeout_timestamp=1728719645549188)
[2024-10-12 15:53:40.561118] WDIAG [SERVER.OMT] process_one (ob_worker_processor.cpp:89) [10248][T1004_L0_G0][T1004][YB420B7BF3C7-00062431EDB7E770-0-0] [lt=10][errcode=-4002] process request fail(ret=-4002)
[2024-10-12 15:53:40.561129] WDIAG [SERVER.OMT] process (ob_worker_processor.cpp:157) [10248][T1004_L0_G0][T1004][YB420B7BF3C7-00062431EDB7E770-0-0] [lt=9][errcode=-4002] process request fail(ret=-4002)

1 个赞

登录业务租户执行开启归档操作

1)设置trace信息
SET ob_enable_show_trace=‘ON’;

2)执行sql。

3)获取上个命令的trace
select last_trace_id();

4)获取trace对应的节点
select query_sql,svr_ip from gv$ob_sql_audit where trace_id=‘第三步获取的trace信息’;

5)取对应的svr_ip节点 过滤日志
grep “第三步获取的trace信息” observer.log*
grep “第三步获取的trace信息” rootservice.log*

6)提供日志信息

第四步骤获取的是空

模糊搜索出来的也没有对应的SQL:

obclient [oceanbase]> select query_sql from gv$ob_sql_audit where query_sql like “%LOG_RESTORE_SOURCE%”;
±-----------------------------------------------------------------------------------------+
| query_sql |
±-----------------------------------------------------------------------------------------+
| select * from DBA_OB_LOG_RESTORE_SOURCE |
| select * from DBA_OB_LOG_RESTORE_SOURCE |
| select * from DBA_OB_LOG_RESTORE_SOURCE |
| select * from DBA_OB_LOG_RESTORE_SOURCE |
| SELECT * FROM oceanbase.DBA_OB_LOG_RESTORE_SOURCE |
| SELECT * FROM oceanbase.DBA_OB_LOG_RESTORE_SOURCE |
| select query_sql,svr_ip from gv$ob_sql_audit where query_sql like “%LOG_RESTORE_SOURCE%” |
±-----------------------------------------------------------------------------------------+

第二步 执行 你的建立主从ALTER 了吗

没有执行成功,报错:ERROR 1210 (HY000): Incorrect arguments to get primary connection, all servers are unreachable

all servers are unreachable
意思是存在节点连接不上,你看看是否防火墙网络问题

你看看zone3状态是什么样子,试着把zone3修复或者完全清理掉再试试呢。因为这边也确实没有听过在集群出现节点故障后去搭建主从的案例

节点是从SELECT * FROM oceanbase.DBA_OB_ACCESS_POINT 获取的,里面有三个IP和端口,比如分别是:A:2881,B:2881,C:2881

其中A:2881 已经被kill掉了,那么我将A:2881从SERVICE参数中摘掉也不行么?比如下面的SQL:
ALTER SYSTEM SET LOG_RESTORE_SOURCE = ‘SERVICE=B:2881;B:2881 USER=rep_user@tenant PASSWORD=123456’

还是说建立主从必须得主租户里面的A/B/C 三个节点都正常才可以?

这个因为目前没有碰到过相关类似的。您可以自己测试一下,或者先将坏掉的节点永久下线后再搭建主从试试