安装ocp_agent失败

错误码
CMP10002
错误信息
操作 OCP-Agent 失败,错误信息:Authentication failed for wrong digest
错误原因
在 OCP-Agent 127.0.0.1 上执行命令 /api/v1/time 失败,错误信息:Authentication failed for wrong digest

日志:
2025-06-17 16:03:45.748 INFO 1565579 — [manual-subtask-executor15,4dee6492f5d745e7,a55dc93e5b292eed] c.o.ocp.service.iam.user.UserService : user 100 login with organization 10000000

2025-06-17 16:03:45.754 INFO 1565579 — [manual-subtask-executor15,4dee6492f5d745e7,a55dc93e5b292eed] c.o.o.c.t.e.runner.JavaSubtaskRunner : Retry subtask, id=162, context=Context{parallelIdx=-1, stringMap={rpm_name=ocp-agent-ce-4.3.5-20250319105844.el7.aarch64.rpm, use_host_key=******, is_reinstall_host_agent=true, ocp_agent_install_version=4.3.5-20250319105844, task_instance_id=49, task_operation=retry, latest_execution_start_time=2025-06-17T16:03:45.707+08:00, sub_task_instance_name=Reinstall ocp agent, ocp_agent_origin_version=4.3.5-20250319105844, sub_task_instance_id=162, host_id=4, is_upgrade_host_agent=false}, listMap={}}, executor=127.0.0.1

2025-06-17 16:03:45.758 INFO 1565579 — [manual-subtask-executor15,4dee6492f5d745e7,a55dc93e5b292eed] c.o.o.s.t.b.host.ReinstallOcpAgentTask : [ReInstallOcpAgentTask] begin

2025-06-17 16:03:45.768 INFO 1565579 — [manual-subtask-executor15,4dee6492f5d745e7,a55dc93e5b292eed] c.o.o.c.agent.HostAgentServiceImpl : Finding OCP agent: hostId=4

2025-06-17 16:03:45.776 INFO 1565579 — [manual-subtask-executor15,4dee6492f5d745e7,a55dc93e5b292eed] c.o.o.c.a.p.HostAgentProcessServiceImpl : Getting all OCP agent processes on host 4

2025-06-17 16:03:45.784 INFO 1565579 — [manual-subtask-executor15,4dee6492f5d745e7,a55dc93e5b292eed] c.o.o.c.agent.HostAgentServiceImpl : Finding OCP agent: hostId=4

2025-06-17 16:03:45.791 INFO 1565579 — [manual-subtask-executor15,4dee6492f5d745e7,a55dc93e5b292eed] c.o.o.c.a.p.HostAgentProcessServiceImpl : Getting all OCP agent processes on host 4

2025-06-17 16:03:45.806 INFO 1565579 — [manual-subtask-executor15,4dee6492f5d745e7,a55dc93e5b292eed] c.o.o.e.internal.template.HttpTemplate : GET request to agent, url:http://127.0.0.1:62888/api/v1/time, params:null

2025-06-17 16:03:45.814 ERROR 1565579 — [manual-subtask-executor15,4dee6492f5d745e7,a55dc93e5b292eed] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : [AgentClient]:http request is failed, response:Authentication failed for wrong digest

com.oceanbase.ocp.executor.exception.HttpRequestFailedException: [AgentClient]:http request is failed, response:Authentication failed for wrong digest
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.checkSuccess(HttpTemplate.java:479)
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.doGet(HttpTemplate.java:221)
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.get(HttpTemplate.java:85)
at com.oceanbase.ocp.executor.executor.AgentExecutor.getHostTime(AgentExecutor.java:171)
at com.oceanbase.ocp.service.task.business.host.ReinstallOcpAgentTask.run(ReinstallOcpAgentTask.java:51)
at com.oceanbase.ocp.core.task.runtime.Subtask.retry(Subtask.java:49)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.retry(JavaSubtaskRunner.java:76)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:35)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26)
at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:76)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.doRun(SubtaskExecutor.java:207)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.redirectConsoleOutput(SubtaskExecutor.java:201)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.lambda$submit$2(SubtaskExecutor.java:137)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Set state for subtask: 162, operation:RETRY, state: FAILED

3 个赞

是首次安装ocp_agent还是重装?OCP4.3.5版本吗

1 个赞

重装,之前的ocp崩掉了,重新接管集群安装agent就失败

1 个赞

发下安装报错时这个节点的mgragent.log

1 个赞

你将这台机器上原来的ocp_agent kill掉再试下

1 个赞

现在搞不了,我接管集群给我然后安装agent然后全部显示agent重启中,这个问题咋处理呢

1 个赞

登陆这个observer集群的所有机器 kill掉这几个进程,将安装失败的任务回滚掉,重新安装

1 个赞

报错:提示获取主机4的凭证失败,错误码:cmp07016
2025-06-17 17:04:34.544 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.ocp.service.iam.user.UserService : user 100 login with organization 10000000

2025-06-17 17:04:34.555 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.c.t.e.runner.JavaSubtaskRunner : Run subtask, id=199, context=Context{parallelIdx=-1, stringMap={rpm_name=ocp-agent-ce-4.3.5-20250319105844.el7.aarch64.rpm, use_host_key=******, is_reinstall_host_agent=true, ocp_agent_install_version=4.3.5-20250319105844, task_instance_id=81, task_operation=execute, latest_execution_start_time=2025-06-17T17:04:34.513+08:00, sub_task_instance_name=Uninstall ocp agent, ocp_agent_origin_version=4.3.5-20250319105844, sub_task_instance_id=199, host_id=4, is_upgrade_host_agent=false}, listMap={}}, executor=127.0.0.1

2025-06-17 17:04:34.560 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.s.t.b.host.UninstallOcpAgentTask : [UninstallOcpAgentTask] started

2025-06-17 17:04:34.568 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.c.agent.HostAgentServiceImpl : Finding OCP agent: hostId=4

2025-06-17 17:04:34.577 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.c.a.p.HostAgentProcessServiceImpl : Getting all OCP agent processes on host 4

2025-06-17 17:04:34.606 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.c.agent.HostAgentServiceImpl : Finding OCP agent: hostId=4

2025-06-17 17:04:34.614 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.c.a.p.HostAgentProcessServiceImpl : Getting all OCP agent processes on host 4

2025-06-17 17:04:34.642 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.c.agent.HostAgentServiceImpl : Finding OCP agent: hostId=4

2025-06-17 17:04:34.653 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.c.a.p.HostAgentProcessServiceImpl : Getting all OCP agent processes on host 4

2025-06-17 17:04:34.680 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.c.agent.HostAgentServiceImpl : Finding OCP agent: hostId=4

2025-06-17 17:04:34.689 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.c.a.p.HostAgentProcessServiceImpl : Getting all OCP agent processes on host 4

2025-06-17 17:04:34.721 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.c.agent.HostAgentServiceImpl : Finding OCP agent: hostId=4

2025-06-17 17:04:34.730 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.c.a.p.HostAgentProcessServiceImpl : Getting all OCP agent processes on host 4

2025-06-17 17:04:34.758 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.service.task.util.AgentTaskUtils : Create ssh executor from task context

2025-06-17 17:04:34.776 INFO 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.service.task.util.AgentTaskUtils : Get credential from vault.

2025-06-17 17:04:34.791 ERROR 1565579 — [manual-subtask-executor15,28590768fd80cf78,0a45ba76b9565d96] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : Failed to obtain the credential information of Host 4. Check whether the credential information exists.

com.oceanbase.ocp.core.exception.NotFoundException: [OCP NotFoundException]: status=404 NOT_FOUND, errorCode=COMPUTE_HOST_GET_CREDENTIAL_FAILED, args=4
at com.oceanbase.ocp.core.i18n.ErrorCodes$Kind$8.exception(ErrorCodes.java:1788)
at com.oceanbase.ocp.core.i18n.ErrorCodes.exception(ErrorCodes.java:1669)
at com.oceanbase.ocp.core.util.ExceptionUtils.require(ExceptionUtils.java:154)
at com.oceanbase.ocp.service.task.util.AgentTaskUtils.createSshExecutor(AgentTaskUtils.java:69)

1 个赞

你接管时为这些主机添加凭据了吗

1 个赞

接入集群的是加了,但是还是不行,另外两台我把agent进程给kill掉,但是前端好事现在重启中,无法回滚

接管失败的任务无法回滚吗?你截图看下


这个主机空闲,点击自动检查和修复会对原有的业务集群产生影响吗

1 个赞

回滚成功了

这台主机上有observer在跑吗?如果有建议先检查,看下结果再决定是否要修复

有的,这是在运的业务集群

agent安装成功那台也是,另外两台agent能修复吗

1 个赞

可以,按照前面说的方法做就可以

但是现在后台进程及ocp_agent都是kill掉了的,前端回滚成功,但是那个状态还是在重启中,无法操作

你将这两台主机删除掉试下呢?有报错吗?然后重新接管