添加节点超时,add observer timeout

【 使用环境 】测试环境
【 OB 】v4.3.1
【 OCP】版本号: 4.3.1-20240805192406
【问题描述】使用OCP创建的OB集群,在OCP点击添加observer节点,到了add observer这个流程也是报错超时,无法添加
【复现路径】OCP添加observer
【附件及日志】

2024-09-11 14:21:41.410 INFO 26467 — [manual-subtask-executor15,a4cfea886a10a90f,21451a6148f5eef6] c.o.ocp.obsdk.connector.ConnectTemplate : Last Trace Info:[YB420A15081A-000621C87DEFEAEE-0-0]
2024-09-11 14:21:41.411 INFO 26467 — [manual-subtask-executor15,a4cfea886a10a90f,21451a6148f5eef6] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] slow query, durationMillis=599957, sql=alter system add server ? zone ?
2024-09-11 14:21:41.412 INFO 26467 — [manual-subtask-executor15,a4cfea886a10a90f,21451a6148f5eef6] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: set ob_query_timeout = ?, args: [10000000]
2024-09-11 14:21:41.414 ERROR 26467 — [manual-subtask-executor15,a4cfea886a10a90f,21451a6148f5eef6] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : Timeout
java.sql.SQLException: Timeout
at com.oceanbase.jdbc.internal.protocol.AbstractQueryProtocol.readErrorPacket(AbstractQueryProtocol.java:2347)
at com.oceanbase.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:2210)
at com.oceanbase.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:2098)
at com.oceanbase.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:399)
at com.oceanbase.jdbc.JDBC4PreparedStatement.executeInternal(JDBC4PreparedStatement.java:247)
at com.oceanbase.jdbc.JDBC4PreparedStatement.execute(JDBC4PreparedStatement.java:170)
at com.oceanbase.jdbc.JDBC4PreparedStatement.executeUpdate(JDBC4PreparedStatement.java:204)
at com.alibaba.druid.pool.DruidPooledPreparedStatement.executeUpdate(DruidPooledPreparedStatement.java:255)
at org.springframework.jdbc.core.JdbcTemplate.lambda$update$2(JdbcTemplate.java:967)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:650)
at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:962)
at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:1017)
at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:1027)
at com.oceanbase.ocp.obsdk.connector.ConnectTemplate.updateInner(ConnectTemplate.java:293)
at com.oceanbase.ocp.obsdk.connector.ConnectTemplate.update(ConnectTemplate.java:282)
at com.oceanbase.ocp.obsdk.operator.cluster.MysqlClusterOperator.addServer(MysqlClusterOperator.java:613)
at com.oceanbase.ocp.service.task.business.server.ObServerTaskHandler.addObServer(ObServerTaskHandler.java:196)
at com.oceanbase.ocp.service.task.business.server.AddObServerTask.run(AddObServerTask.java:53)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.execute(JavaSubtaskRunner.java:64)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubta
skRunner.doRun(JavaSubtaskRunner.java:32)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26)
at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:76)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.doRun(SubtaskExecutor.java:206)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.redirectConsoleOutput(SubtaskExecutor.java:200)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.lambda$submit$2(SubtaskExecutor.java:137)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Set state for subtask: 2008969, operation:EXECUTE, state: FAILED



rootservice.log (6.4 MB)

这个节点被拉入黑名单了,应该通信有问题的,可以检查下防火墙,端口冲突

是OCP拉进黑名单,还是OB集群把它拉进黑名单?怎么处理呢?这几个节点都是内网同一个网段,防火墙也关闭了,网络没有限制的

OB集群访问不到 10.21.8.23:2882 ,你 telnet 10.21.8.23 2882 测试下看看

回滚了任务,2882端口没有了。同一个网段局域网,端口全通没限制的

完整的任务日志下载发下

log_task_2006662.zip (29.0 KB)

subtask_2008969.log (27.1 KB)

刚重新执行添加observer,复现了这个问题。环境属于内网同一个网段,网络没问题的
image

是否版本问题?当前OceanBase 构建版本号:4.3.1.0-100000032024051615,我需要将OB集群升级到4.3.2版本吗?因为之前在4.2版本并没出现此问题

正常情况下这个版本也不会有这个问题,我们分析下

grep “[KeepAliveC” observer.log 发下

经排查,已确认为OCP bug,会在OCP 432修复