ocp 添加zone和server 执行到 Active agent modules 提示连接OCP-Agent失败

【 使用环境 】POC环境
【 OB or 其他组件 】OCP:4.3.5-20250319105844 ob:4.3.5.1
【 使用版本 】4.3.5-20250319105844
【问题描述】
1.现象:使用ocp 进行集群1-1-1 扩容到 1-1-1-1-1 添加最后一个zone5时失败,如下图。


2.尝试处理
2.1使用ocp 重启 ocp agent ,重新安装agent 问题未解决。
2.2在ocp 服务器上 telent 相关主机 及端口正常


3.目标主机上未启用防火墙及相关策略。
4.(备注)目标主机在做为2-2-2 集群中的一员测试时正常
5. 报错日志详细见附件,ip地址进行了打码。
关健日志如下
2025-07-15 11:45:56.147 ERROR 13042 — [manual-subtask-executor15,9a43896116634b93,844cc57420481178] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : Read timed out

java.net.SocketTimeoutException: Read timed out

at com.oceanbase.ocp.executor.internal.template.HttpTemplate.lambda$doPost$2(HttpTemplate.java:288)
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.safeExecute(HttpTemplate.java:492)
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.doPost(HttpTemplate.java:288)
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.post(HttpTemplate.java:117)
at com.oceanbase.ocp.executor.executor.AgentExecutor.updateAgentConfig(AgentExecutor.java:196)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl.configAgent(HostAgentConfigManagerImpl.java:745)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl.configAgent(HostAgentConfigManagerImpl.java:753)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl.activeObModule(HostAgentConfigManagerImpl.java:675)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl$$FastClassBySpringCGLIB$$798eb4cd.invoke()
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy.invokeMethod(Cgl
ibAopProxy.java:386)
【复现路径】问题出现前后相关操作
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):

【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

【备注】基于 LLM 和开源文档 RAG 的论坛小助手已开放测试,在发帖时输入 [@论坛小助手] 即可召唤小助手,欢迎试用!

详细日志如下:
2025-07-15 11:26:40.697 INFO 13042 — [manual-subtask-executor15,2b588150686c9c43,7e75037a8c8be5db] c.o.o.e.internal.template.HttpTemplate : POST request to agent, url:http://10.xx.xx.x:62888/api/v1/module/config/update, request body:UpdateAgentConfigRequest(agentHome=null, configs=[(monagent.pipeline.ob.status=active), (monagent.pipeline.session.status=active), (monagent.pipeline.plan.monitor.status=active), (monagent.pipeline.plan.monitor.status=active), (monagent.pipeline.transaction.status=active), (monagent.pipeline.sql.audit.status=active), (monagent.pipeline.sql.plan.status=active), (monagent.pipeline.slow.sql.status=active), (ob.logcleaner.enabled=true), (ob.logtailer.enabled=true)]), params:null

2025-07-15 11:27:10.767 ERROR 13042 — [manual-subtask-executor15,2b588150686c9c43,7e75037a8c8be5db] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : Read timed out

java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:471)
at org.glassfish.jersey.client.ClientRuntim
e.invoke(ClientRuntime.java:297)
at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$2(JerseyInvocation.java:687)
at org.glassfish.jersey.client.JerseyInvocation.call(JerseyInvocation.java:697)
at org.glassfish.jersey.client.JerseyInvocation.lambda$runInScope$3(JerseyInvocation.java:691)
at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at org.glassfish.jersey.internal.Errors.process(Errors.java:205)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390)
at org.glassfish.jersey.client.JerseyInvocation.runInScope(JerseyInvocation.java:691)
at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:686)
at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:461)
at org.glassfish.jersey.client.JerseyInvocation$Builder.post(JerseyInvocation.java:357)
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.lambda$doPost$2(HttpTemplate.java:288)
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.safeExecute(HttpTemplate.java:492)
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.doPost(HttpTemplate.java:288)
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.post(HttpTemplate.java:117)
at com.oceanbase.ocp.executor.executor.AgentExecutor.updateAgentConfig(AgentExecutor.java:196)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl.configAgent(HostAgentConfigManagerImpl.java:745)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl.configAgent(HostAgentConfigManagerImpl.java:753)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl.activeObModule(HostAgentConfigManagerImpl.java:675)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl$$FastClassBySpringCGLIB$$798eb4cd.invoke()
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy.invokeMethod(Cgl
ibAopProxy.java:386)
at org.springframework.aop.framework.CglibAopProxy.access$000(CglibAopProxy.java:85)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:703)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl$$EnhancerBySpringCGLIB$$51b06abe.activeObModule()
at com.oceanbase.ocp.compute.agent.HostAgentServiceImpl.activeObModule(HostAgentServiceImpl.java:812)
at com.oceanbase.ocp.compute.agent.HostAgentServiceImpl.activeAgentModule(HostAgentServiceImpl.java:429)
at com.oceanbase.ocp.compute.agent.HostAgentServiceImpl$$FastClassBySpringCGLIB$$e7947c56.invoke()
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy.invokeMethod(CglibAopProxy.java:386)
at org.springframework.aop.framework.CglibAopProxy.access$000(CglibAopProxy.java:85)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:703)
at com.oceanbase.ocp.compute.agent.HostAgentServiceImpl$$EnhancerBySpringCGLIB$$6c523ef9.activeAgentModule()
at com.oceanbase.ocp.service.task.business.obagent.ActiveAgentModuleTask.run(ActiveAgentModuleTask.java:42)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.execute(JavaSubtaskRunner.java:64)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:32)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26)
at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:76)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.doRun(SubtaskExecutor.java:207)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.redirectConsoleOutput(SubtaskExecutor.java:201)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.lambda$submit$2(SubtaskExecutor.java:137)
at java.util.concurrent.FutureTask
.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

Set state for subtask: 1003797, operation:EXECUTE, state: FAILED

这个节点的monagent.log和mgragent.log 也麻烦发下

经分析应该是这个节点OB没有启动成功,导致ocp-agent一直连不上它,从而连接ocp-agent超时失败,具体需要分析observer.log 为什么ob没启动成功,由于现场没保留无法分析日志,第二次添加成功了,没能复现出来,无法分析根因了。

2025-07-15T11:51:26.48202+08:00 ERROR [84144,] caller=common/observer.go:183:RefreshTask: refresh basic info failed get observer version: scan observer version: dial tcp xx.xx.xx.xx:2881: connect: connection refused fields:, error=“get observer version: scan observer version: dial tcp xx.xx.xx.xx:2881: connect: connection refused”

感谢旭辉老师定位分析