【 使用环境 】测试环境
【 OB or 其他组件 】
ocp-3.3.0-ce-bp1
oceanbase-ce-3.1.4
【 使用版本 】
【问题描述】集群重启后启动,ocp启动ob集群任务无法成功,显示集群状态"重启中"
【复现路径】
前置操作:因为存储空间问题导致服务器出现问题,强制重启集群后发现/home/admin/obproxy/bin/obproxy
和/home/admin/oceanbase/bin/observer
软连接指向的位置都不存在,手动修正以后,完成ocp启动
- ob集群处于"不可用"状态,在集群总览,右上角"…"点击重启集群
- 重启任务失败(我记得第一步就出错了,名称好像是stop什么),ocp显示"重启中"
- 点击放弃任务,仍然失败
- 将失败的任务点击标记成功,可以继续向下走了,到"Stop zone"仍然失败
- 点击放弃任务,失败
2022-12-09 18:41:32.198 WARN 765 --- [pool-manual-subtask-executor3,d11178ab4cf34315,b0cee336db79] c.a.o.c.m.t.model.SubtaskInstanceEntity : Rollback subtask, id=3000080, context=Context{parallelIdx=-1, stringMap={cluster_name=obcluster, prohibit_rollback=false, former_cluster_status=UNAVAILABLE, target_zone_status=RUNNING, task_instance_id=3000040, task_operation=rollback, zone_name=zone2, ob_cluster_id=1, cluster_id=1, ocpagent_service_name=agent, target_cluster_status=RUNNING, latest_execution_start_time=2022-12-09T18:41:32.115+08:00, sub_task_instance_id=3000080}, listMap={server_ids=[1], 1.obcluster.zone3.server_ids=[2], 1.obcluster.zone3.host_ids=[2], 1.obcluster.zone1.server_ids=[3], 1.obcluster.zone2.server_ids=[1], 1.obcluster.zone1.host_ids=[3], host_ids=[1], 1.obcluster.zone2.host_ids=[1], zone_names=[zone2, zone3, zone1]}}, executor=10.10.53.24
2022-12-09 18:41:32.203 INFO 765 --- [pool-manual-subtask-executor3,d11178ab4cf34315,b0cee336db79] c.a.o.s.t.business.zone.StopObZoneTask : try to start zone, clusterId=1, zoneName=zone2
2022-12-09 18:41:32.213 INFO 765 --- [pool-manual-subtask-executor3,d11178ab4cf34315,b0cee336db79] c.a.o.s.t.b.zone.ObZoneTaskHandler : begin to start zone, clusterId=1, zone=zone2
2022-12-09 18:41:32.290 INFO 765 --- [pool-manual-subtask-executor3,d11178ab4cf34315,b0cee336db79] c.a.o.c.o.connector.ObConnectorHolder : [obsdk] no ob connector found in holder, key=ObConnectorKey(connectionMode=proxy, clusterName=obcluster, obClusterId=1, tenantName=sys, username=root, address=localhost, port=2888, database=oceanbase)
2022-12-09 18:41:32.342 INFO 765 --- [pool-manual-subtask-executor3,d11178ab4cf34315,b0cee336db79] c.a.o.core.obsdk.connector.ObConnectors : [obsdk] create new ob connector, connectProperties=ConnectProperties(connectionMode=proxy, address=localhost, port=2888, obsAddrList=null, username=root, tenantName=sys, clusterName=obcluster, obClusterId=1, proxy=null, vpcId=1, compatibilityMode=MYSQL, database=oceanbase)
2022-12-09 18:41:33.155 ERROR 765 --- [pool-manual-subtask-executor3,d11178ab4cf34315,b0cee336db79] c.a.o.core.obsdk.connector.ObConnectors : [obsdk] init ob connector failed, connectProperties=ConnectProperties(connectionMode=proxy, address=localhost, port=2888, obsAddrList=null, username=root, tenantName=sys, clusterName=obcluster, obClusterId=1, proxy=null, vpcId=1, compatibilityMode=MYSQL, database=oceanbase), cause:java.sql.SQLNonTransientConnectionException: Could not connect to address=(host=localhost)(port=2888)(type=master) : Could not connect to localhost:2888 : unexpected end of stream, read 0 bytes from 4 (socket was closed by server)
2022-12-09 18:41:33.167 ERROR 765 --- [pool-manual-subtask-executor3,d11178ab4cf34315,b0cee336db79] c.a.o.s.o.o.f.AbstractObOperatorFactory : [AbstractObOperatorFactory] create operator failed, error message:[obsdk] init ob connector failed, connectProperties=ConnectProperties(connectionMode=proxy, address=localhost, port=2888, obsAddrList=null, username=root, tenantName=sys, clusterName=obcluster, obClusterId=1, proxy=null, vpcId=1, compatibilityMode=MYSQL, database=oceanbase), cause:java.sql.SQLNonTransientConnectionException: Could not connect to address=(host=localhost)(port=2888)(type=master) : Could not connect to localhost:2888 : unexpected end of stream, read 0 bytes from 4 (socket was closed by server)
2022-12-09 18:41:33.223 INFO 765 --- [pool-manual-subtask-executor3,d11178ab4cf34315,b0cee336db79] c.a.o.c.m.t.model.SubtaskInstanceEntity : Set state for subtask: 3000080, current state: RUNNING, new state: FAILED
2022-12-09 18:41:33.248 WARN 765 --- [pool-manual-subtask-executor3,d11178ab4cf34315,b0cee336db79] c.a.o.c.t.engine.runner.RunnerFactory : Execute task failed, subtask=SubtaskInstanceEntity{id=3000080, name=Stop zone, state=FAILED, operation=ROLLBACK, className=com.alipay.ocp.service.task.business.zone.StopObZoneTask, seriesId=34, startTime=2022-12-09T18:41:32.115+08:00, endTime=2022-12-09T18:41:33.248+08:00}, failedMessage=Failed to connect to Cluster obcluster. Check whether the password of the root user under the sys tenant in the cluster in the password box is correct, whether the whitelist for the sys tenant is set correctly, and whether the network is connected.
com.alipay.ocp.core.exception.UnexpectedException: [OCP UnexpectedException]: status=500 INTERNAL_SERVER_ERROR, errorCode=OB_CLUSTER_CONNECT_FAILED, args=obcluster,root
at com.alipay.ocp.service.oceanbase.obsdk.factory.AbstractObOperatorFactory.getObOperator(AbstractObOperatorFactory.java:169) ~[ocp-service-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.service.oceanbase.obsdk.factory.AbstractObOperatorFactory.createObOperator(AbstractObOperatorFactory.java:121) ~[ocp-service-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.service.oceanbase.obsdk.factory.AbstractObOperatorFactory.createClusterOperator(AbstractObOperatorFactory.java:182) ~[ocp-service-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.service.oceanbase.obsdk.factory.AbstractObOperatorFactory.createClusterOperator(AbstractObOperatorFactory.java:178) ~[ocp-service-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.service.task.business.zone.ObZoneTaskHandler.startZone(ObZoneTaskHandler.java:88) ~[ocp-service-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.service.task.business.zone.StopObZoneTask.rollback(StopObZoneTask.java:69) ~[ocp-service-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.metadb.task.model.SubtaskInstanceEntity.rollback(SubtaskInstanceEntity.java:249) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.o
cp.core.task.engine.runner.JavaTaskRunner.doExecute(JavaTaskRunner.java:34) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.JavaTaskRunner.run(JavaTaskRunner.java:20) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:113) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.RunnerFactory.redirectOutputIfNotSysSchedule(RunnerFactory.java:185) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.RunnerFactory.run(RunnerFactory.java:102) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.coordinator.worker.subtask.ReadySubtaskWorker.lambda$submitTask$3(ReadySubtaskWorker.java:123) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_312]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_312]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_312]
- 点击设为成功,长时间的卡在这里
【问题现象及影响】
集群无法启动
【附件】