社区版ocp 4.0.3 bp1 添加主机失败

【 使用环境 】生产测试环境
【 OB or 其他组件 】
【 使用版本 】社区版 OCP 4.0.3 BP1
【问题描述】主机在4.0.3版本中可以添加成功,但是在4.0.3 bp1版本中添加失败

【复现路径】
1.先是报 Execute clock diff failed, 已经确认帐号有sudo免密权限,也试过直接用root帐号也是报同样的错

2023-08-26 10:07:49.761 ERROR 7305 --- [pool-manual-subtask-executor16,57e744fe35414ea1,f51aa4131469] c.o.ocp.core.util.ExceptionUtils         : Checked Exception: com.oceanbase.ocp.core.exception.UnexpectedException occurred with code error.common.unexpected, and args [Execute clock diff failed., 192.168.51.222]

2023-08-26 10:07:49.764  WARN 7305 --- [pool-manual-subtask-executor16,57e744fe35414ea1,f51aa4131469] c.o.o.c.t.engine.runner.RunnerFactory    : Execute task failed, subtask=SubtaskInstanceOverview{id=11017, name=Pre check for create host, state=FAILED, operation=EXECUTE, className=com.oceanbase.ocp.service.task.business.host.PreCreateHostCheckTask, seriesId=2, startTime=2023-08-26T10:07:49.453+08:00, endTime=null}, failedMessage=An unknown error has occurred. Cause: Execute clock diff failed.. Error message: 192.168.51.222. Contact the administrator.

com.oceanbase.ocp.core.exception.UnexpectedException: [OCP UnexpectedException]: status=500 INTERNAL_SERVER_ERROR, errorCode=COMMON_UNEXPECTED, args=Execute clock diff failed.,192.168.51.222
	at sun.reflect.GeneratedConstructorAccessor524.newInstance(Unknown Source) ~[na:na]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_382]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_382]
	at com.oceanbase.ocp.core.util.ExceptionUtils.newException(ExceptionUtils.java:96) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.util.ExceptionUtils.throwException(ExceptionUtils.java:90) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.util.ExceptionUtils.unExpected(ExceptionUtils.java:77) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.service.task.business.host.PreCreateHostCheckTask.checkIsRemoteClockDiffAcceptable(PreCreateHostCheckTask.java:108) ~[ocp-service-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.service.task.business.host.PreCreateHostCheckTask.run(PreCreateHostCheckTask.java:68) ~[ocp-service-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.execute(JavaSubtaskRunner.java:60) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:32) ~[ocp-core-4.0.3-202
30721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:111) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.redirectOutputIfNotSysSchedule(RunnerFactory.java:183) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.run(RunnerFactory.java:101) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.ReadySubtaskWorker.lambda$null$1(ReadySubtaskWorker.java:127) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_382]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_382]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_382]

2.在论坛里看到 OCP4.0点击租户管理报错 里说可以直接跳过,跳过后执行下一步时,继续报错:
Checked Exception: com.oceanbase.ocp.core.exception.UnexpectedException occurred with code error.file.package.not.exists, and args [[name=t-oceanbase-ocp-agent, version=4.0.3, architecture=x86_64]]

已经确认有上传了从官网下的 ocp agent rpm包,版本是:1.3.1

2023-08-26 10:08:05.394 ERROR 7305 --- [pool-manual-subtask-executor16,57e744fe35414ea1,ac20a2b7211b] c.o.ocp.core.util.ExceptionUtils         : Checked Exception: com.oceanbase.ocp.core.exception.UnexpectedException occurred with code error.file.package.not.exists, and args [[name=t-oceanbase-ocp-agent, version=4.0.3, architecture=x86_64]]

2023-08-26 10:08:05.398  WARN 7305 --- [pool-manual-subtask-executor16,57e744fe35414ea1,ac20a2b7211b] c.o.o.c.t.engine.runner.RunnerFactory    : Execute task failed, subtask=SubtaskInstanceOverview{id=11020, name=Update host arch info, state=FAILED, operation=EXECUTE, className=com.oceanbase.ocp.service.task.business.host.UpdateHostSystemInfoTask, seriesId=0, startTime=2023-08-26T10:08:04.548+08:00, endTime=null}, failedMessage=Software package [name=t-oceanbase-ocp-agent, version=4.0.3, architecture=x86_64] does not exist.

com.oceanbase.ocp.core.exception.UnexpectedException: [OCP UnexpectedException]: status=500 INTERNAL_SERVER_ERROR, errorCode=FILE_PACKAGE_NOT_EXIST, args=[name=t-oceanbase-ocp-agent, version=4.0.3, architecture=x86_64]
	at sun.reflect.GeneratedConstructorAccessor524.newInstance(Unknown Source) ~[na:na]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_382]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_382]
	at com.oceanbase.ocp.core.util.ExceptionUtils.newException(ExceptionUtils.java:96) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.util.ExceptionUtils.throwException(ExceptionUtils.java:90) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.util.ExceptionUtils.unExpected(ExceptionUtils.java:77) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.service.task.business.host.UpdateHostSystemInfoTask.checkMatchingOcpAgentExist(UpdateHostSystemInfoTask.java:87) ~[ocp-service-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.service.task.business.host.UpdateHostSystemInfoTask.run(UpdateHostSystemInfoTask.java:62) ~[ocp-service-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.execute(JavaSubtaskRunner.java:60) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:32) ~[ocp-core-4.0.3-20
230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:111) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.redirectOutputIfNotSysSchedule(RunnerFactory.java:183) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.run(RunnerFactory.java:101) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.ReadySubtaskWorker.lambda$null$1(ReadySubtaskWorker.java:127) ~[ocp-core-4.0.3-20230721.jar!/:4.0.3-20230721]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_382]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_382]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_382]
	at java.lang.Thread.run(Thread.java:750) ~[na:1.8.0_382]

1 个赞

添加的机器和OCP服务器的时钟不能相差太大

时间同步没有问题,不是时间的问题

使用root在ocp机器上执行 setcap cap_net_raw+ep /usr/sbin/clockdiff
这个主要是admin用户在接管机器是,执行clockdiff没有权限导致

1 个赞

setcap cap_net_raw+ep /usr/sbin/clockdiff 执行了这个后,第一步的报错没有了,正常通过。
这里想问一下,为什么ocp 4.0.3 BP1 上要做这个动作 而在4.0.3 不用? 这两个版本具体有哪些改动,有没有文档可以看的?

目前还是卡在第二步上:
Checked Exception: com.oceanbase.ocp.core.exception.UnexpectedException occurred with code error.file.package.not.exists, and args [[name=t-oceanbase-ocp-agent, version=4.0.3, architecture=x86_64]]


上传了这个包后,就可以了。
问题是我们机器都是centos的系统,为什么会卡在一个alios7系统的包上面: t-oceanbase-ocp-agent-4.0.3-20230801094520.alios7.x86_64.rpm

bp1之后,ocp的安装使用rpm包安装方式,4.0.3的时候还是用的docker安装方式,所以这块有些不一样,这个clockdiff问题后续版本会修复

这个也是一个已知问题,远程安装的ocp,没有把agent的包传过去,需要手动上传下安装包

社区ocp是最新的4.2版本,添加主机还是报这个错误,已经在ocp机器执行setcap cap_net_raw+ep /usr/sbin/clockdiff,还是报这个错误,时钟已同步

这个方式成功了

该方法解决问题