OCP接管OBD集群但是报集群不可用

,

【产品名称】OCP

【产品版本】

【问题描述】

1、提供一下当前使用的ocp的版本。

2、修复一下ocp主机和当前访问ocp控制台机器上的时钟同步。

OCP版本3.1.1-ce

ocp主机已经同步ntp了

能否发一下具体的报错,现在的信息量太少了;



点击租户--》用户管理,显示这个报错,我确认了root@tpcc_tenant密码(密码为空)和白名单都没问题

手工可以连接到集群吗?看起来就是用户名或者密码不对导致的无法连接;

可以的,性能监控有实时图,我一直在做压测

可不可以卸载ocp然后我再重新部署试试,有卸载ocp的操作步骤吗

ocp本质上是docker容器 卸载可以直接docker rm ${container},重新部署时建议drop掉ocp依赖的元数据库(meta和monitor),然后重新部署即可


1 个赞

好,建议文档上也加上卸载相关的操作

感谢建议,如有问题请继续反馈

好的,多谢。

不客气,感谢您的反馈

可以不使用空密码吗,在密码箱中是否有添加过这个租户的密码,接管的时候应该只填了sys租户的密码


重新部署后,进行接管集群,有两个问题

  1. 部署任务中的凭据用户使用root还是admin,这两个我都试了,都有报错,报错信息是check observer process user中为啥会检测2883端口呢,不应该是2881吗,这个是我哪里配置错了吗?
  2. 在上面的任务失败后,我回滚或者放弃任务,再次点接管集群,就报“rpc调用io异常,server:xxx port:6288”

麻烦解答一下上面的两个问题,总的来说感觉这个接管任务不是很合理,建议可以编辑任务,随时可以修改。

看的不是很明白,重新部署是指用obd部署的还是直接使用ocp部署呢?被接管的OB集群是obd部署的还是ocp部署的呢?

重新部署指的是卸载ocp后重新部署的ocp,然后接管obd部署的ob集群

下面是对应的报错日志:看样子是检测端口问题,检测了2883,实际observer的端口应该是2881

############{EXECUTE}{2022-05-10T09:38:54.010+08:00}############2022-05-10 09:38:54.018 INFO 57 --- [pool-subtask-executor-thread-58,49160d6a7acd43bb,833014262b70] c.a.o.c.m.j.model.SubtaskInstanceEntity : Run subtask, id=80, context=Context(parallelIdx=0, stringMap={cluster_version=3.1.2, cluster_name=obtest, target_server_status=RUNNING, ssh_port=22, service_name=obtest:1, target_zone_status=RUNNING, task_instance_id=61, ob_connect_address=xx.209:2883, task_operation=execute, cluster_type=PRIMARY, service_version=3.1.2, ob_cluster_id=1, cluster_id=1, service_type=OB_CLUSTER, ob_data_dir=/ssddata1/ob/data, connection_mode=proxy, target_cluster_status=RUNNING, latest_execution_start_time=2022-05-10T09:38:54.005+08:00, sub_task_instance_id=80, credential_id=1}, listMap={add_region_ids=[1], server_ids=[1, 2, 3, 4, 5, 6], add_idc_ids=[1], all_host_ids=[1, 2, 3, 4, 5, 6], add_host_ids=[1, 2, 3, 4, 5, 6], host_ids=[1, 2, 3, 4, 5, 6], zone_names=[zone2, zone1, zone3]}), executor=192.168.149.209

2022-05-10 09:38:54.059 INFO 57 --- [pool-subtask-executor-thread-58,49160d6a7acd43bb,833014262b70] c.a.ocp.core.task.util.OcpAgentUtils   : [OcpAgentUtils.runCmd] svrIp=192.168.149.213, port=62888, user=root, cmd=netstat -tunlp | grep 2883 | grep observer | awk '{print $7}' | awk -F/ '{print $1}'

2022-05-10 09:38:54.103 INFO 57 --- [pool-subtask-executor-thread-58,49160d6a7acd43bb,833014262b70] c.a.ocp.core.task.util.OcpAgentUtils   : [OcpAgentUtils.runCmd] result=

2022-05-10 09:38:54.107 ERROR 57 --- [pool-subtask-executor-thread-58,49160d6a7acd43bb,833014262b70] com.alipay.ocp.core.util.ExceptionUtils : Checked Exception: com.alipay.ocp.core.exception.UnexpectedException occurred with code error.ob.cluster.takeover.pid.not.found, and args [2883]

2022-05-10 09:38:54.109 INFO 57 --- [pool-subtask-executor-thread-58,49160d6a7acd43bb,833014262b70] c.a.o.c.m.j.model.SubtaskInstanceEntity : Set state for subtask: 80, current state: RUNNING, new state: FAILED

2022-05-10 09:38:54.112 WARN 57 --- [pool-subtask-executor-thread-58,49160d6a7acd43bb,833014262b70] c.a.ocp.core.job.runner.RunnerFactory  : Execute task failed, subtask=SubtaskInstanceEntity{id=80, name=Check observer process user, state=FAILED, operation=EXECUTE, className=com.alipay.ocp.service.task.business.host.CheckObserverProcessUserTask, seriesId=37, startTime=2022-05-10T09:38:54.005+08:00, endTime=2022-05-10T09:38:54.111+08:00}, failedMessage=Can not find observer process with port 2883


com.alipay.ocp.core.exception.UnexpectedException: [OCP UnexpectedException]: status=500 INTERNAL_SERVER_ERROR, errorCode=OB_CLUSTER_TAKEOVER_OBSERVER_PID_NOT_FOUND, args=2883

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_312]

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_312]

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_312]

at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_312]

at com.alipay.ocp.core.util.ExceptionUtils.newException(ExceptionUtils.java:96) ~[ocp-core-3.1.1-20210916.jar!/:3.1.1-20210916]

at com.alipay.ocp.core.util.ExceptionUtils.throwException(ExceptionUtils.java:90) ~[ocp-core-3.1.1-20210916.jar!/:3.1.1-20210916]

at com.alipay.ocp.core.util.ExceptionUtils.unExpected(ExceptionUtils.java:77) ~[ocp-core-3.1.1-20210916.jar!/:3.1.1-20210916]

at com.alipay.ocp.service.task.business.host.CheckObserverProcessUserTask.run(CheckObserverProcessUserTask.java:40) ~[ocp-service-3.1.1-20210916.jar!/:3.1.1-20210916]

at com.alipay.ocp.core.metadb.job.model.SubtaskInstanceEntity.run(SubtaskInstanceEntity.java:216) ~[ocp-core-3.1.1-20210916.jar!/:3.1.1-20210916]

at com.alipay.ocp.core.job.runner.JavaTaskRunner.doExecute(JavaTaskRunner.java:26) ~[ocp-core-3.1.1-20210916.jar!/:3.1.1-20210916]

at com.alipay.ocp.core.job.runner.JavaTaskRunner.run(JavaTaskRunner.java:20) ~[ocp-core-3.1.1-20210916.jar!/:3.1.1-20210916]

at com.alipay.ocp.core.job.runner.RunnerFactory.doRun(RunnerFactory.java:103) ~[ocp-core-3.1.1-20210916.jar!/:3.1.1-20210916]

at com.alipay.ocp.core.job.runner.RunnerFactory.redirectOutputIfNotSysSchedule(RunnerFactory.java:147) ~[ocp-core-3.1.1-20210916.jar!/:3.1.1-20210916]

at com.alipay.ocp.core.job.runner.RunnerFactory.run(RunnerFactory.java:92) ~[ocp-core-3.1.1-20210916.jar!/:3.1.1-20210916]

at com.alipay.ocp.core.job.coordinator.worker.subtask.ReadySubtaskWorker.lambda$submitTask$2(ReadySubtaskWorker.java:123) ~[ocp-core-3.1.1-20210916.jar!/:3.1.1-20210916]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_312]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_312]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_312]

at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_312]


############{RETRY}{2022-05-10T09:39:42.992+08:00}############2022-05-10 09:39:43.000 INFO 57 --- [pool-subtask-executor-thread-17,49160d6a7acd43bb,5a2e00ca06c9] c.a.o.c.m.j.model.SubtaskInstanceEntity : Retry subtask, id=80, context=Context(parallelIdx=0, stringMap={cluster_version=3.1.2, cluster_name=obtest, target_server_status=RUNNING, ssh_port=22, service_name=obtest:1, target_zone_status=RUNNING, task_instance_id=61, ob_connect_address=xx.209:2883, task_operation=retry, cluster_type=PRIMARY, service_version=3.1.2, ob_cluster_id=1, cluster_id=1, service_type=OB_CLUSTER, ob_data_dir=/ssddata1/ob/data, connection_mode=proxy, target_cluster_status=RUNNING, latest_execution_start_time=2022-05-10T09:39:42.986+08:00, sub_task_instance_id=80, credential_id=1}, listMap={add_region_ids=[1], server_ids=[1, 2, 3, 4, 5, 6], add_idc_ids=[1], all_host_ids=[1, 2, 3, 4, 5, 6], add_host_ids=[1, 2, 3, 4, 5, 6], host_ids=[1, 2, 3, 4, 5, 6], zone_names=[zone2, zone1, zone3]}), executor=192.168.149.209

2022-05-10 09:39:43.039 INFO 57 --- [pool-subtask-executor-thread-17,49160d6a7acd43bb,5a2e00ca06c9] c.a.ocp.core.task.util.OcpAgentUtils   : [OcpAgentUtils.runCmd] svrIp=192.168.149.213, port=62888, user=root, cmd=netstat -tunlp | grep 2883 | grep observer | awk '{print $7}' | awk -F/ '{print $1}'

2022-05-10 09:39:43.082 INFO 57 --- [pool-subtask-executor-thread-17,49160d6a7acd43bb,5a2e00ca06c9] c.a.ocp.core.task.util.OcpAgentUtils   : [OcpAgentUtils.runCmd] result=

2022-05-10 09:39:43.086 ERROR 57 --- [pool-subtask-executor-thread-17,49160d6a7acd43bb,5a2e00ca06c9] com.alipay.ocp.core.util.ExceptionUtils : Checked Exception: com.alipay.ocp.core.exception.UnexpectedException occurred with code error.ob.cluster.takeover.pid.not.found, and args [2883]

2022-05-10 09:39:43.087 INFO 57 --- [pool-subtask-executor-thread-17,49160d6a7acd43bb,5a2e00ca06c9] c.a.o.c.m.j.model.SubtaskInstanceEntity : Set state for subtask: 80, current state: RUNNING, new state: FAILED

2022-05-10 09:39:43.090 WARN 57 --- [pool-subtask-executor-thread-17,49160d6a7acd43bb,5a2e00ca06c9] c.a.ocp.core.job.runner.RunnerFactory  : Execute task failed, subtask=SubtaskInstanceEntity{id=80, name=Check observer process user, state=FAILED, operation=RETRY, className=com.alipay.ocp.service.task.business.host.CheckObserverProcessUserTask, seriesId=37, startTime=2022-05-10T09:39:42.987+08:00, endTime=2022-05-10T09:39:43.089+08:00}, failedMessage=Can not find observer process with port 2883


com.alipay.ocp.core.exception.UnexpectedException: [OCP UnexpectedException]: status=500 INTERNAL_SERVER_ERROR, errorCode=OB_CLUSTER_TAKEOVER_OBSERVER_PID_NOT_FOUND, args=2883


使用ocp接管前,obd cluster check4ocp 验证的截图麻烦提供一下呢

没有这个命令