OCP 接管集群失败后点击任务回滚,无法重新接管集群也无法删除集群

【 使用环境 】 测试环境
【 OB or 其他组件 】OCP
【 使用版本 】ocp-server-ce-4.3.2-20240925174740.el7
【问题描述】使用OBD白屏安装OCP,采用新建OceanBase作为OCP的mata数据库方式,集群名命名为ocp_test。安装完ocp后,我自行修改了用户proxyro的密码,然后OCP接管ocp_test集群失败,然后在任务列表中点击回滚之后,ocp_test集群页面无法迁出集群,无法重新接管集群。截图如下:
任务信息:


集群页面信息:

接管集群报错:

1 个赞

麻烦问下
1.你是在哪里修改的proxyro密码?
2.下载接管集群的任务日志发下
3.下载回滚任务日志发下

1 个赞

1、修改是使用的在客户端执行:alter user proxyro IDENTIFIED by ‘***’;
2、执行日志:
exe.log (10.6 KB)

(执行日志刚刚下载出问题了,我又重新上传一次,你可以重新看下)

3、回滚日志:
exe.log (24.5 KB)

应该是录入的主机凭证不对,在凭据管理里改下主机凭据,然后重试下任务

2024-12-04 14:35:48.328  WARN 19895 --- [manual-subtask-executor2,29f814937ee4fd47,3d8320591db8ff0c] com.oceanbase.ocp.common.ssh.SshChannel  : Failed to init ssh client auth by PASSWORD

net.schmizz.sshj.userauth.UserAuthException: Exhausted available authentication methods
	at net.schmizz.sshj.SSHClient.auth(SSHClient.java:230)
	at net.schmizz.sshj.SSHClient.auth(SSHClient.java:205)
	at net.schmizz.sshj.SSHClient.authPassword(SSHClient.java:291)
	at net.schmizz.sshj.SSHClient.authPassword(SSHClient.java:261)
	at net.schmizz.sshj.SSHClient.authPassword(SSHClient.java:245)
	at com.oceanbase.ocp.common.ssh.SshChannel.initSshClient(SshChannel.java:98)
	at com.oceanbase.ocp.common.ssh.SshChannel.connect(SshChannel.java:75)
	at com.oceanbase.ocp.executor.internal.connector.impl.DefaultSshConnector.init(DefaultSshConnector.java:55)
	at com.oceanbase.ocp.executor.internal.connector.Connectors.getConnector(Connectors.java:91)
	at com.oceanbase.ocp.executor.internal.connector.Connectors.getSshConnector(Connectors.java:70)
	at com.oceanbase.ocp.executor.internal.template.SshTemplate.<init>(SshTemplate.java:40)
	at com.oceanbase.ocp.executor.executor.SshExecutor.<init>(SshExecutor.java:84)
	at com.oceanbase.ocp.compute.host.executor.RemoteExecutorFactoryImpl.createSshExecutorWithPassword(RemoteExecutorFactoryImpl.java:171)
	at com.oceanbase.ocp.compute.host.executor.RemoteExecutorFactoryImpl.createSshExecutor(RemoteExecutorFactoryImpl.java:123)
	at com.oceanbase.ocp.service.task.util.AgentTaskUtils.createSshExecutor(AgentTaskUtils.java:38)
	at com.oceanbase.ocp.service.task.business.host.PreCreateHostCheckTask.run(PreCreateHostCheckTask.java:58)
	at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.execute(JavaSubtaskRunner.java:64)
	at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:32)
	at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26)
	at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:76)
	at 
com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.doRun(SubtaskExecutor.java:206)
	at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.redirectConsoleOutput(SubtaskExecutor.java:200)
	at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.lambda$submit$2(SubtaskExecutor.java:137)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)


2024-12-04 14:35:48.398  INFO 19895 --- [manual-subtask-executor2,29f814937ee4fd47,3d8320591db8ff0c] n.schmizz.sshj.transport.TransportImpl   : Disconnected - BY_APPLICATION
2024-12-04 14:35:48.404 ERROR 19895 --- [manual-subtask-executor2,29f814937ee4fd47,3d8320591db8ff0c] c.o.o.e.i.c.impl.DefaultSshConnector     : [DefaultSshConnector]:failed to init ssh connector, connect properties:ConnectProperties(hostAddress=10.66.231.86, httpPort=null, monitorPort=null, sshPort=22, authentication=Authentication(httpAuth=null, sshAuth=SshAuthentication(authType=password, passwordAuthConfig=PasswordAuthConfig(username=obadmin), privateKeyAuthConfig=null)), proxy=null, extHttpHeaders=null), errorMsg:net.schmizz.sshj.userauth.UserAuthException: Exhausted available authentication methods, cause:{}

我任务页面已经没有重试按钮了

看状态是已经会滚成功了,再次接管是报这个错误吗?

是的,就是这个报错

登陆ocp_meta租户,查下库meta_database的ob_cluster表是否有这个集群?

obclient -hxx.xx.xx.xx -P2881 -uroot@ocp_meta -p’xxx’ -Dmeta_database -A

select * from ob_cluster\G;

有的,结果如下:

这条记录你delete掉,再接管试下

delete后重新接管报错

麻烦发下包含报错时间的ocp-server.log,或者再来一遍 将ocp-server.log发下

ocp.log也发下,通常在
/home/admin/ocp-server/log/ocp.log

不好意思,集群关闭之后起不来了,测了很多办法没有成功,我直接redeploy了

好的,在使用过程中如果有疑问或者问题欢迎发帖提问

好的,感谢支持。

没错 一样的报错

您这里遇到的是类似问题吗?麻烦发个新帖