添加observer 子任务一直retry,然后超时失败

【 使用环境 】测试环境
【 OB or 其他组件 】ocp,observer
【 使用版本 】4.3
【问题描述】ocp添加observer是有个子任务一直retry,然后超时失败了。
【复现路径】
【附件及日志】

老师帮忙看看,我这里添加observer,一直retry,怎么搞
############{RETRY}{2024-04-27T09:50:12.247+08:00}############
2024-04-27 09:50:13.526 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.o.c.t.e.runner.JavaSubtaskRunner : Retry subtask, id=18190, context=Context{parallelIdx=2, stringMap={ob_log_disk_path=/data/log1, host.7.zone=zone3, task_instance_id=16985, task_operation=retry, ob_run_path=/home/admin/oceanbase, service_version=4.3.0.1, ob_install_path=/home/admin/oceanbase, ob_sql_port=2881, cluster_id=3, ob_disk_path_style=DEFAULT, ob_svr_port=2882, sub_task_instance_name=Run io calibration, host.5.zone=zone1, sub_task_instance_id=18190, cluster_name=YMStudio, target_server_status=RUNNING, mix_rpm_names={“zone3”:“oceanbase-ce-4.3.0.1-100000242024032211.el7.x86_64.rpm”,“zone2”:“oceanbase-ce-4.3.0.1-100000242024032211.el7.x86_64.rpm”,“zone1”:“oceanbase-ce-4.3.0.1-100000242024032211.el7.x86_64.rpm”}, startup_option_string=, subtask_splitter=host_ids, service_name=YMStudio:1003, host.6.zone=zone2, target_zone_status=RUNNING, ob_cluster_id=1003, service_type=OB_CLUSTER, dep_rpm_names={“zone3”:[“oceanbase-ce-libs-4.3.0.1-100000242024032211.el7.x86_64.rpm”,“oceanbase-ce-utils-4.3.0.1-100000242024032211.el7.x86_64.rpm”],“zone2”:[“oceanbase-ce-libs-4.3.0.1-100000242024032211.el7.x86_64.rpm”,“oceanbase-ce-utils-4.3.0.1-100000242024032211.el7.x86_64.rpm”],“zone1”:[“oceanbase-ce-libs-4.3.0.1-100000242024032211.el7.x86_64.rpm”,“oceanbase-ce-utils-4.3.0.1-100000242024032211.el7.x86_64.rpm”]}, ob_run_user=admin, target_cluster_status=RUNNING, latest_execution_start_time=2024-04-27T09:50:12.192+08:00, ob_data_disk_path=/data/1, mix_obs_rpm={}}, listMap={root_server_ips=[192.168.10.101, 192.168.10.102, 192.168.10.103], server_ids=[8, 9, 10], exists_server_addrs=[192.168.10.101:2882, 192.168.10.102:2882, 192.168.10.103:2882], host_ids=[5, 6, 7], exists_running_server_addrs=[192.168.10.101:2882, 192.168.10.102:2882, 192.168.10.103:2882], zone_names=[zone1, zone2, zone3]}}, executor=192.168.10.100

2024-04-27 09:50:13.615 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.o.s.t.b.c.RunIoCalibrationJobTask : Begin to run io calibration job, ip:192.168.10.106, port:2882

2024-04-27 09:50:13.819 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ObConnectors : [obsdk]:connected server ip:192.168.10.101, sql port:2881

2024-04-27 09:50:13.850 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ObConnectors : [obsdk]:connected server ip:192.168.10.101, sql port:2881

2024-04-27 09:50:13.873 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: set ob_query_timeout = ?, args: [10000000]

2024-04-27 09:50:13.900 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: SELECT svr_ip, svr_port, storage_name, status, start_time, finish_time FROM GV$OB_IO_CALIBRATION_STATUS WHERE svr_ip = ? AND svr_port = ?, args: [192.168.10.106, 2882]

2024-04-27 09:50:13.928 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.o.s.t.b.c.RunIoCalibrationJobTask : Io calibration job already exists, just waiting for finish

2024-04-27 09:50:14.033 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.o.s.o.o.f.ConnectPropertiesBuilder : get credential from obsdk context, clusterName=YMStudio, tenantName=sys, dbUser=root

2024-04-27 09:50:14.069 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ObConnectors : [obsdk]:connected server ip:192.168.10.101, sql port:2881

2024-04-27 09:50:14.101 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ObConnectors : [obsdk]:connected server ip:192.168.10.101, sql port:2881

2024-04-27 09:50:14.125 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: set ob_query_timeout = ?, args: [10000000]

2024-04-27 09:50:14.157 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: SELECT svr_ip, svr_port, storage_name, status, start_time, finish_time FROM GV$OB_IO_CALIBRATION_STATUS WHERE svr_ip = ? AND svr_port = ?, args: [192.168.10.106, 2882]

2024-04-27 09:50:14.205 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.common.lang.pattern.Retry : wait for 5 seconds

2024-04-27 09:50:19.313 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.o.s.o.o.f.ConnectPropertiesBuilder : get credential from obsdk context, clusterName=YMStudio, tenantName=sys, dbUser=root

2024-04-27 09:50:19.339 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ObConnectors : [obsdk]:connected server ip:192.168.10.101, sql port:2881

2024-04-27 09:50:19.386 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ObConnectors : [obsdk]:connected server ip:192.168.10.101, sql port:2881

2024-04-27 09:50:19.411 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: set ob_query_timeout = ?, args: [10000000]

2024-04-27 09:50:19.444 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: SELECT svr_ip, svr_port, storage_name, status, start_time, finish_time FROM GV$OB_IO_CALIBRATION_STATUS WHERE svr_ip = ? AND svr_port = ?, args: [192.168.10.106, 2882]

2024-04-27 09:50:19.483 INFO 25569 — [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.common.lang.pattern.Retry : wait for 5 seconds

2024-04-27 10:20:00.771  INFO 25569 --- [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.common.lang.pattern.Retry        : wait for 5 seconds

2024-04-27 10:20:05.861  INFO 25569 --- [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.o.s.o.o.f.ConnectPropertiesBuilder   : get credential from obsdk context,  clusterName=YMStudio, tenantName=sys, dbUser=root

2024-04-27 10:20:05.890  INFO 25569 --- [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ObConnectors     : [obsdk]:connected server ip:192.168.10.101, sql port:2881

2024-04-27 10:20:05.919  INFO 25569 --- [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ObConnectors     : [obsdk]:connected server ip:192.168.10.101, sql port:2881

2024-04-27 10:20:05.947  INFO 25569 --- [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ConnectTemplate  : [obsdk] sql: set ob_query_timeout = ?, args: [600000000]

2024-04-27 10:20:05.971  INFO 25569 --- [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ConnectTemplate  : [obsdk] sql: SELECT svr_ip, svr_port, storage_name, status, start_time, finish_time FROM GV$OB_IO_CALIBRATION_STATUS WHERE svr_ip = ? AND svr_port = ?, args: [192.168.10.106, 2882]

2024-04-27 10:20:06.008  INFO 25569 --- [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.common.lang.pattern.Retry        : wait for 5 seconds

2024-04-27 10:20:11.107  INFO 25569 --- [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.o.s.o.o.f.ConnectPropertiesBuilder   : get credential from obsdk context,  clusterName=YMStudio, tenantName=sys, dbUser=root

2024-04-27 10:20:11.152  INFO 25569 --- [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ObConnectors     : [obsdk]:connected server ip:192.168.10.101, sql port:2881

2024-04-27 10:20:11.189  INFO 25569 --- [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ObConnectors     : [obsdk]:connected server ip:192.168.10.101, sql port:2881

2024-04-27 10:20:11.234  INFO 25569 --- [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ConnectTemplate  : [obsdk] sql: set ob_query_timeout = ?, args: [600000000]

2024-04-27 10:20:11.285  INFO 25569 --- [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.obsdk.connector.ConnectTemplate  : [obsdk] sql: SELECT svr_ip, svr_port, storage_name, status, start_time, finish_time FROM GV$OB_IO_CALIBRATION_STATUS WHERE svr_ip = ? AND svr_port = ?, args: [192.168.10.106, 2882]

2024-04-27 10:20:11.327  INFO 25569 --- [pool-manual-subtask-executor16,6fba67c35fd24f04,15b85e8d4c39] c.o.ocp.common.lang.pattern.Retry        : wait for 5 seconds

Set state for subtask: 18190, operation:RETRY, state: FAILED
2024-04-27 10:20:12.433  WARN 25569 --- [pool-subtask-executor30,f54755ddc74a4e39,84203be4a0c1] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor  : Receive timeout callback, id=18190, name=Run io calibration, elapsed=1800, timeout=1800

ocp-server.log日志提供下。在部署路径下的log里

run io calculation 这个任务主要是收集一下 OB 主机的 IO 能力,这个任务失败一般是部署的机器 IO 不是很好。
这个任务失败了没关系,可以跳过(设置为【已成功】),继续执行后面的任务,不影响部署。
生产环境不要跳过。

ocp-server.rar (5.2 MB)