最新社区版OCP无法接管最新版OBD集群,无法添加主机,无法安装OCPAgent

【 使用环境 】生产环境
【 OB or 其他组件 】
【 使用版本 】OCP: ocp-all-in-one-4.2.1-20231208144448
OB: 最新版
【问题描述】添加主机出错,卡在Install ocp agent步骤,导致无法接管OBD集群
【复现路径】问题出现前后相关操作
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):

【SOP系列 22 】——故障诊断第一步(系统巡检和诊断信息收集)

Prepare host 1021362 对象:

100.65.2.71

类型:主机 失败 5/8 admin 2023年12月25日 09:33:32 2023年12月25日 09:33:45

2023-12-25 09:33:44.517 INFO 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] c.o.o.c.t.e.runner.JavaSubtaskRunner : Run subtask, id=1021430, context=Context{parallelIdx=-1, stringMap={host_ip=100.65.2.71, ssh_port=22, task_instance_id=1021362, task_operation=execute, host_credential_id=17, latest_execution_start_time=2023-12-25T09:33:44.477+08:00, sub_task_instance_name=Install ocp agent, sub_task_instance_id=1021430, host_id=1000004}, listMap={}}, executor=100.65.2.70

2023-12-25 09:33:44.520 INFO 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] c.o.o.s.t.b.host.InstallOcpAgentTask : [InstallOcpAgentTask] started

2023-12-25 09:33:44.524 INFO 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] c.o.o.c.agent.HostAgentServiceImpl : Finding OCP agent: hostId=1000004

2023-12-25 09:33:44.533 WARN 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] c.o.o.c.agent.HostAgentServiceImpl : OCP agent not found: hostId=1000004

2023-12-25 09:33:44.537 INFO 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] c.o.o.c.agent.HostAgentServiceImpl : Finding OCP agent: hostId=1000004

2023-12-25 09:33:44.542 WARN 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] c.o.o.c.agent.HostAgentServiceImpl : OCP agent not found: hostId=1000004

2023-12-25 09:33:44.546 INFO 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] c.o.o.service.task.util.AgentTaskUtils : Create ssh executor from task context

2023-12-25 09:33:44.550 INFO 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] c.o.o.service.task.util.AgentTaskUtils : CredentialId in context, use specific credential

2023-12-25 09:33:44.565 INFO 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] com.oceanbase.ocp.common.ssh.SshUtils : SSH executeCommand begin: echo 1 on 100.65.2.71

2023-12-25 09:33:44.591 INFO 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] com.oceanbase.ocp.common.ssh.SshUtils : SSH executeCommand end: echo 1 on 100.65.2.71, result: SshResult(host=100.65.2.71, username=admin, command=echo 1, out=1, err=, extOut=null, exitStatus=0)

2023-12-25 09:33:44.641 INFO 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] com.oceanbase.ocp.common.ssh.SshUtils : SSH executeCommand begin: echo 1 on 100.65.2.71

2023-12-25 09:33:44.671 INFO 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] com.oceanbase.ocp.common.ssh.SshUtils : SSH executeCommand end: echo 1 on 100.65.2.71, result: SshResult(host=100.65.2.71, username=admin, command=echo 1, out=1, err=, extOut=null, exitStatus=0)

2023-12-25 09:33:44.712 INFO 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] com.oceanbase.ocp.common.ssh.SshUtils : SSH executeCommand begin: sudo bash -c ‘netstat -ant| awk ‘"’"’$4~":62888$" { print $4"\t"$6 }’"’"’’ on 100.65.2.71

2023-12-25 09:33:44.755 INFO 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] com.oceanbase.ocp.common.ssh.SshUtils : SSH executeCommand end: sudo bash -c ‘netstat -ant| awk ‘"’"’$4~":62888$" { print $4"\t"$6 }’"’"’’ on 100.65.2.71, result: SshResult(host=100.65.2.71, username=admin, command=sudo bash -c ‘netstat -ant| awk ‘"’"’$4~":62888$" { print $4"\t"$6 }’"’"’’, out=bash: netstat: command not found, err=, extOut=null, exitStatus=0)

2023-12-25 09:33:44.769 INFO 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] c.o.o.e.internal.template.SshTemplate : SSH execute end: sudo bash -c ‘netstat -ant| awk ‘"’"’$4~":62888$" { print $4"\t"$6 }’"’"’’ on 100.65.2.71,result:SshResult(host=100.65.2.71, username=admin, command=sudo bash -c ‘netstat -ant| awk ‘"’"’$4~":62888$" { print $4"\t"$6 }’"’"’’, out=bash: netstat: command not found, err=, extOut=null, exitStatus=0)

2023-12-25 09:33:44.773 ERROR 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] c.o.ocp.core.util.ExceptionUtils : Checked Exception: com.oceanbase.ocp.core.exception.UnexpectedException occurred with code error.common.unexpected, and args [port is occupied, 62888]

2023-12-25 09:33:44.777 ERROR 6067 — [pool-manual-subtask-executor15,d786ee2c54ac4ba6,9dc81543d79a] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : An unknown error has occurred. Cause: port is occupied. Error message: 62,888. Contact the administrator.

com.oceanbase.ocp.core.exception.UnexpectedException: [OCP UnexpectedException]: status=500 INTERNAL_SERVER_ERROR, errorCode=COMMON_UNEXPECTED, args=port is occupied,62888
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.oceanbase.ocp.core.util.ExceptionUtils.newException(ExceptionUtils.java:96)
at com.oceanbase.ocp.core.util.ExceptionUtils.throwException(ExceptionUtils.java:90)
at com.oceanbase.ocp.core.util.ExceptionUtils.unExpected(ExceptionUtils.java:77)
at com.oceanbase.ocp.service.task.business.host.InstallOcpAgentTask.checkAgentPortsAvailable(InstallOcpAgentTask.java:107)
at com.oceanbase.ocp.service.task.business.host.InstallOcpAgentTask.run(InstallOcpAgentTask.java:61)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.execute(JavaSubtaskRunner.java:59)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:31)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:25)
at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:76)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.doRun(SubtaskExecutor.java:193)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.redirectConsoleOutput(SubtaskExecutor.java:187)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.lambda$submit$2(SubtaskExecutor.java:124)
at java.util.concurrent.FutureTask.run(FutureTask.jav
a:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

Set state for subtask: 1021430, operation:EXECUTE, state: FAILED

显示端口被占用,我尝试stop接管的集群,重启3台主机,再次添加主机安装ocpagent还是卡在这一步

已解决,细看LOG发现没装net-tools