使用ocp升级ob集群4.3.5.1至4.3.5.3卡住

【 使用环境 】POC环境
【 OB or 其他组件 】OB &OCP
【 使用版本 】
【问题描述】

1.使用ocp升级ob集群 4.3.5.1至4.3.5.3 卡住如下图:

  1. 卡在dag 1037815 not finish will check later 如下日志

2025-08-28 16:25:27.933 INFO 13042 — [manual-subtask-executor15,5695bdf60600eedb,607ebd4d8e16bb03] c.o.o.s.t.business.WaitDagSuccessTask : dag 1037815 not finish, will check later

2025-08-28 16:25:27.947 INFO 13042 — [manual-subtask-executor15,5695bdf60600eedb,607ebd4d8e16bb03] com.oceanbase.ocp.core.util.TaskUtils : [TaskUtils] wait 60 seconds

2025-08-28 16:26:27.985 INFO 13042 — [manual-subtask-executor15,5695bdf60600eedb,607ebd4d8e16bb03] c.o.o.s.t.business.WaitDagSuccessTask : dag 1037815 not finish, will check later

2025-08-28 16:26:27.999 INFO 13042 — [manual-subtask-executor15,5695bdf60600eedb,607ebd4d8e16bb03] com.oceanbase.ocp.core.util.TaskUtils : [TaskUtils] wait 60 seconds

2025-08-28 16:27:28.047 INFO 13042 — [manual-subtask-executor15,5695bdf60600eedb,607ebd4d8e16bb03] c.o.o.s.t.business.WaitDagSuccessTask : dag 1037815 not finish, will check later

2025-08-28 16:27:28.060 INFO 13042 — [manual-subtask-executor15,5695bdf60600eedb,607ebd4d8e16bb03] com.oceanbase.ocp.core.util.TaskUtils : [TaskUtils] wait 60 seconds

2025-08-28 16:28:28.116 INFO 13042 — [manual-subtask-executor15,5695bdf60600eedb,607ebd4d8e16bb03] c.o.o.s.t.business.WaitDagSuccessTask : dag 1037815 not finish, will check later

2025-08-28 16:28:28.127 INFO 13042 — [manual-subtask-executor15,5695bdf60600eedb,607ebd4d8e16bb03] com.oceanbase.ocp.core.util.TaskUtils : [TaskUtils] wait 60 seconds

2025-08-28 16:29:28.177 INFO 13042 — [manual-subtask-executor15,5695bdf60600eedb,607ebd4d8e16bb03] c.o.o.s.t.business.WaitDagSuccessTask : dag 1037815 not finish, will check later

2025-08-28 16:29:28.193 INFO 13042 — [manual-subtask-executor15,5695bdf60600eedb,607ebd4d8e16bb03] com.oceanbase.ocp.core.util.TaskUtils : [TaskUtils] wait 60 seconds

3.“1037815” 相关日志:
2025-08-28 15:19:22.885 INFO 13042 — [manual-subtask-executor15,5695bdf60600eedb,607ebd4d8e16bb03] c.o.o.c.t.e.runner.JavaSubtaskRunner : Run subtask, id=1037741, context=Context{parallelIdx=-1, stringMap={cluster_version=4.3.5.3-103000092025080818, wait_task_instance_id=1037815, cluster_name=myobdemo, freeze_server=true, subtask_splitter=cluster_ids, service_name=myobdemo:1744871768, task_instance_id=1037814, task_operation=execute, upgrade_mode=ROLLING, service_version=4.3.5.1, cluster_id=2, service_type=OB_CLUSTER, binary_version_idx=1, operating_system=el7, target_cluster_status=RUNNING, latest_execution_start_time=2025-08-28T15:19:22.836+08:00, sub_task_instance_name=Wait dag success, sub_task_instance_id=1037741, primary_cluster_id=2}, listMap={binary_upgrade_versions=[4.3.5.3-103000092025080818], cluster_ids=[2], host_ids=[2, 3, 6, 7, 5], direct_upgrade_versions=[], upgrade_versions=[4.3.5.3-103000092025080818]}}, executor=10.xx.xx.10

4.ocp任务详细日志参见附件
log_task_1037814.zip (8.2 KB)

感谢老师

【复现路径】问题出现前后相关操作
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):

【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

【备注】基于 LLM 和开源文档 RAG 的论坛小助手已开放测试,在发帖时输入 [@论坛小助手] 即可召唤小助手,欢迎试用!

4 个赞

您好,您提的这个技术问题可能牵涉到 OceanBase 企业版范围内的功能细节;针对此类问题,建议你通过以下方式寻求帮助:

  1. 如你所在的企业客户已签署 OceanBase 企业版销售合同,请你联系客户经理;

  2. 如你所在的企业客户尚未签署 OceanBase 企业版销售合同,你可通过OceanBase官网商务咨询页面留下你的联系方式,OceanBase企业版的业务顾问会在一个工作日内与你联系。

另外,我们欢迎你使用社区版,并在论坛/社群中分享你对社区版本的想法、经验和问题,与其他社区成员共同交流。

(小助手的答复已结束,如未能解决您的问题,请继续提问并等待其他同学的回复,谢谢!)

1 个赞

是社区版

1 个赞


subtask_1037787.log (91.3 KB)

有如下的报错提示:但是我没有执行中的任务如何查看确定?
MyError: 'upgrade checker failed with 1 reasons: [1 locality tasks is doing, please check] ’

obclient(root@(none))[oceanbase]> SELECT * FROM oceanbase.DBA_OB_TENANT_JOBS WHERE JOB_STATUS<>‘SUCCESS’\G
*************************** 1. row ***************************
JOB_ID: 56
JOB_TYPE: ALTER_TENANT_PRIMARY_ZONE
JOB_STATUS: INPROGRESS
RESULT_CODE: NULL
PROGRESS: 0
START_TIME: 2025-08-20 15:44:43.051974
MODIFY_TIME: 2025-08-20 15:44:43.051974
TENANT_ID: 1006
SQL_TEXT: ALTER TENANT bk_test PRIMARY_ZONE = ‘zone1,zone2,zone3,zone4,zone5’
EXTRA_INFO: FROM: ‘zone2;zone1;zone3;zone4;zone5’, TO: ‘zone1,zone2,zone3,zone4,zone5’
RS_SVR_IP: 10.16.xx.x
RS_SVR_PORT: 2882
1 row in set (0.010 sec)

obclient(root@(none))[oceanbase]>

1 个赞

rootservice.log* 里面搜一下ob_upgrade_executor.cpp 看看有没有什么报错日志

连一下sys租户,执行SQL

select * from oceanbase.__all_rootservice_event_history where event=‘admin_run_upgrade_job’;

select * from __all_rootservice_job where job_type like ‘%upgrade%’;

1 个赞

学习了 1111

1 个赞

老师好:
一.反馈:
1.rootservice.log* 里面搜一下ob_upgrade_executor.cpp 看看有没有什么报错日志
无如下:

[admin@obdemo1 log]$ grep -i “ob_upgrade_executor.cpp” rootservice.log
[admin@obdemo1 log]$

2.select * from oceanbase.__all_rootservice_event_history where event=‘admin_run_upgrade_job’;
无如下:
obclient(root@(none))[oceanbase]> select * from oceanbase.__all_rootservice_event_history where event=‘admin_run_upgrade_job’;
Empty set (0.008 sec)

obclient(root@(none))[oceanbase]>

3.select * from __all_rootservice_job where job_type like ‘%upgrade%’;
无如下:
obclient(root@(none))[oceanbase]> select * from __all_rootservice_job where job_type like ‘%upgrade%’;
Empty set (0.008 sec)

obclient(root@(none))[oceanbase]>

二.MyError: 'upgrade checker failed with 1 reasons: [1 locality tasks is doing, please check] ’(该问题已经解决)
我在ocp界面修改了 PRIMARY_ZONE为以下这个错不在报了。
ALTER TENANT bk_test PRIMARY_ZONE = ‘zone1,zone2,zone3,zone4,zone5’

三。新问题。是在升级界面发现如下错误:

File “/tmp/rpms/extract/oceanbase-ce-4.3.5.3-103000092025080818.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_checker.py”, line 531, in modify_server_permanent_offline_time
** set_parameter(cur, ‘server_permanent_offline_time’, ‘72h’)**
** File "/tmp/rpms/extract/oceanbase-ce-4.3.5.3-**

*.6服务器 无论如何修改都是3600s 修改不过来,统一修改成3600s 还是变成如上图效果,很奇怪。
obclient(root@(none))[oceanbase]> alter system set server_permanent_offline_time=‘72h’;
Query OK, 0 rows affected (0.029 sec)

obclient(root@(none))[oceanbase]> alter system set server_permanent_offline_time=‘3600s’;
Query OK, 0 rows affected (0.026 sec)

详细日志如下
File “/tmp/rpms/extract/oceanbase-ce-4.3.5.3-103000092025080818.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_checker.py”, line 71, in set_parameter
wait_parameter_sync(cur, parameter, value)
File “/tmp/rpms/extract/oceanbase-ce-4.3.5.3-103000092025080818.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_checker.py”, line 93, in wait_parameter_sync
raise MyError(""“check {0}:{1} sync timeout”"".format(key, value))
MyError: ‘check server_permanent_offline_time:72h sync timeout’
Traceback (most recent call last):
File “/tmp/rpms/extract/oceanbase-ce-4.3.5.3-103000092025080818.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_checker.py”, line 1067, in
do_check(host, port, user, password, timeout, upgrade_params, cpu_arch)
File “/tmp/rpms/extract/oceanbase-ce-4.3.5.3-103000092025080818.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_checker.py”, line 1032, in do_check
modify_server_permanent_offline_time(cur)
File “/tmp/rpms/extract/oceanbase-ce-4.3.5.3-103000092025080818.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_checker.py”, line 531, in modify_server_permanent_offline_time
set_parameter(cur, ‘server_permanent_offline_time’, ‘72h’)
File “/tmp/rpms/extract/oceanbase-ce-4.3.5.3-103000092025080818.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_checker.py”, line 71, in set_parameter
wait_parameter_sync(cur, parameter, value)
File “/tmp/rpms/extract/oceanbase-ce-4.3.5.3-103000092025080818.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_checker.py”, line 93, in wait_parameter_sync
raise MyError(""“check {0}:{1} sync timeout”"".format(key, value))
main.MyError: ‘check server_permanent_offline_time:72h sync timeout’
, error: exit status 1
at com.oceanbase.ocp.core.i18n.ErrorCodes$Kind$10.exception(ErrorCodes.java:1814)
at com.oceanbase.ocp.core.i18n.ErrorCodes.exception(ErrorCodes.java:1669)
at com.oceanbase.ocp.core.util.ExceptionUtils.newException(ExceptionUtils.
java:169)
at com.oceanbase.ocp.core.util.ExceptionUtils.throwException(ExceptionUtils.java:162)
at com.oceanbase.ocp.service.task.util.AgentAsyncTaskHelper.checkSuccess(AgentAsyncTaskHelper.java:279)
at com.oceanbase.ocp.service.task.util.AgentAsyncTaskHelper.waitForExecuteFinish(AgentAsyncTaskHelper.java:225)
at com.oceanbase.ocp.service.task.util.AgentAsyncTaskHelper.runUpgradeScript(AgentAsyncTaskHelper.java:140)
at com.oceanbase.ocp.service.task.business.cluster.helper.UpgradeTaskHelper.runScript(UpgradeTaskHelper.java:161)
at com.oceanbase.ocp.service.task.business.cluster.helper.UpgradeTaskHelper.runScript(UpgradeTaskHelper.java:106)
at com.oceanbase.ocp.service.task.business.cluster.helper.UpgradeTaskHelper.runScript(UpgradeTaskHelper.java:81)
at com.oceanbase.ocp.service.task.business.cluster.ExecUpgradeCheckerScriptTask.run(ExecUpgradeCheckerScriptTask.java:60)
at com.oceanbase.ocp.core.task.runtime.Subtask.retry(Subtask.java:49)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.retry(JavaSubtaskRunner.java:76)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:35)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26)
at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:76)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.doRun(SubtaskExecutor.java:207)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.redirectConsoleOutput(SubtaskExecutor.java:201)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.lambda$submit$2(SubtaskExecutor.java:137)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

Set state for subtask: 1037787, operation:RETRY, state: FAILED
Set state for subtask: 1037787, operation:RETRY, state: FAILED
2025-08-28 17:46:49.654 WARN 13042 — [subtask-executor27,89712dfa8ae7971e,f765765d012998fe] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : Subtask timeout, name=Execute upgrade checker script, executedSeconds=1811, timeout=1800

1 个赞

参加这种方法抓一下alter 语句的日志
1)设置trace信息
SET ob_enable_show_trace=‘ON’;

2)执行sql。

3)获取上个命令的trace
select last_trace_id();

4)获取trace对应的节点
select query_sql,svr_ip from gv$ob_sql_audit where trace_id=‘第三步获取的trace信息’;

5)取对应的svr_ip节点 过滤日志
grep “第三步获取的trace信息” observer.log*
grep “第三步获取的trace信息” rootservice.log*

6)提供日志信息即可。

1 个赞

学习了

1 个赞

1.辞霜老师好,observer.log无日志,rootservice 有日志见险件
20200829_rootservice.log (3.5 KB)

2.取日志过程
obclient(root@(none))[oceanbase]> SET ob_enable_show_trace=on;
Query OK, 0 rows affected (0.001 sec)

obclient(root@(none))[oceanbase]> alter system set server_permanent_offline_time=‘72h’;
Query OK, 0 rows affected (0.028 sec)

obclient(root@(none))[oceanbase]> select last_trace_id();
±----------------------------------+
| last_trace_id() |
±----------------------------------+
| YB420A100B06-00063A16B3D82B7C-0-0 |
±----------------------------------+
1 row in set (0.037 sec)

obclient(root@(none))[oceanbase]> select query_sql,svr_ip from gv$ob_sql_audit where trace_id=‘YB420A100B06-00063A16B3D82B7C-0-0’;
±--------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------±-----------+
| query_sql | svr_ip |
±--------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------±-----------+
| INSERT INTO __all_sys_parameter (zone, svr_type, svr_ip, svr_port, name, value, info, config_version, gmt_modified, section, scope, sou rce, edit_level, data_type) VALUES (’’, ‘observer’, ‘ANY’, 0, ‘server_permanent_offline_time’, ‘72h’, ‘’, 1756432800565909, usec_to_time( 1756432800565909), ‘ROOT_SERVICE’, ‘CLUSTER’, ‘DEFAULT’, ‘DYNAMIC_EFFECTIVE’, ‘TIME’) ON DUPLICATE KEY UPDATE value = ‘72h’, info = ‘’, c onfig_version = 1756432800565909, gmt_modified = usec_to_time(1756432800565909), section = ‘ROOT_SERVICE’, scope = ‘CLUSTER’, source = ‘D EFAULT’, edit_level = ‘DYNAMIC_EFFECTIVE’, data_type = ‘TIME’ | 10.xx.xx.2 |
| START TRANSACTION | 10.xx.xx.2 |
| UPDATE __all_zone SET value = 1756432800565909, info = ‘’, gmt_modified = now(6) WHERE zone = ‘’ AND name = ‘config_version’ | 10.xx.xx.2 |
| UPDATE __all_zone SET value = 1756432800571954, info = ‘’, gmt_modified = now(6) WHERE zone = ‘’ AND name = ‘lease_info_version’ | 10.xx.xx.2 |
| COMMIT | 10.xx.xx.2 |
±--------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------±-----------+
5 rows in set (2.387 sec)

obclient(root@(none))[oceanbase]>

3.修改结果

1 个赞

observer日志肯定会记录到的,你换个节点找下10.xx.xx.2

1 个赞

老师,10.xx.xx.2 上没有,10.xx.xx.6 上有一点日志,我登陆的是 10.xx.xx.6 执行的命令,日志如下:
[admin@localhost log]$ grep -i “YB420A100B06-00063A16B3D82B7C-0-0” observer.log*
observer.log.20250829100629692:[2025-08-29 10:00:45.783845] INFO [SHARE] add_event (ob_event_history_table_operator.h:266) [9710][T1_L0_G0][T1][YB420A100B06-00063A16B3D82B7C-0-0] [lt=71] event table add task(ret=0, event_table_name="__all_server_event_history", sql=INSERT INTO __all_server_event_history (gmt_create, module, event, name1, value1, name2, value2, name3, value3, name4, value4, value5, value6, svr_ip, svr_port) VALUES (usec_to_time(1756432845783771), ‘sql’, ‘execute_cmd’, ‘cmd_type’, 177, ‘sql_text’, X’2A2A2A’, ‘return_code’, 0, ‘tenant_id’, 1, ‘’, ‘’, ‘10.xx.xx.6’, 2882))
[admin@localhost log]$

老师在10.xx.xx.2 节点又重新操作了一次 结果还是没改过来如下图:

详细操作如下:日志见附件
20250829-2-observer.log (896 字节)
20250829-2-rootservice.log (4.0 KB)

obclient(root@(none))[oceanbase]> SET ob_enable_show_trace=‘on’;
Query OK, 0 rows affected (0.001 sec)

obclient(root@(none))[oceanbase]> alter system set server_permanent_offline_time=‘72h’;
Query OK, 0 rows affected (0.040 sec)

obclient(root@(none))[oceanbase]> select last_trace_id();
±----------------------------------+
| last_trace_id() |
±----------------------------------+
| YB420A100B02-000633866FCFC7DE-0-0 |
±----------------------------------+
1 row in set (0.016 sec)

obclient(root@(none))[oceanbase]> select query_sql,svr_ip from gv$ob_sql_audit where trace_id=‘YB420A100B02-000633866FCFC7D0-0’;
±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±-----------
| query_sql | svr_ip
±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±-----------
| INSERT INTO __all_sys_parameter (zone, svr_type, svr_ip, svr_port, name, value, info, config_version, gmt_modified, secti, scope, source, edit_level, data_type) VALUES (’’, ‘observer’, ‘ANY’, 0, ‘server_permanent_offline_time’, ‘72h’, ‘’, 17564114254569, usec_to_time(1756455114254569), ‘ROOT_SERVICE’, ‘CLUSTER’, ‘DEFAULT’, ‘DYNAMIC_EFFECTIVE’, ‘TIME’) ON DUPLICATE Y UPDATE value = ‘72h’, info = ‘’, config_version = 1756455114254569, gmt_modified = usec_to_time(1756455114254569), sectio= ‘ROOT_SERVICE’, scope = ‘CLUSTER’, source = ‘DEFAULT’, edit_level = ‘DYNAMIC_EFFECTIVE’, data_type = ‘TIME’ | 10.xx.xx.2
| START TRANSACTION | 10.xx.xx.2
| UPDATE __all_zone SET value = 1756455114254569, info = ‘’, gmt_modified = now(6) WHERE zone = ‘’ AND name = ‘config_versi’ | 10.xx.xx.2
| UPDATE __all_zone SET value = 1756455114268693, info = ‘’, gmt_modified = now(6) WHERE zone = ‘’ AND name = ‘lease_info_vsion’ | 10.xx.xx.2
| COMMIT | 10.xx.xx.2
±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±-----------
5 rows in set (2.605 sec)

obclient(root@(none))[oceanbase]>

老师,我使用ocp 重启10.xx.xx.6 这个server 没起来,帮忙看看什么原因,日志如下:
1.重启失败
[admin@localhost log]$ more observer.log.wf
[2025-08-29 16:46:49.163192] INFO New syslog file info: [address: “10.xx.xx.6:2882”, observer version: OceanBase_CE 4.3.5.1, revision: 10100004
2025031818-b6d5706eb3d2c5f501c7fa646ddbf32f3dc87069, sysname: Linux, os release: 3.10.0-1160.119.1.el7.x86_64, machine: x86_64, tz GMT offset: 0
8:00]
[2025-08-29 16:47:44.924744] ERROR [SERVER] start (ob_service.cpp:274) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=62][errcode=
-4395] observice start process has failure(msg=“observice start() has failure”, ret=-4000, ret=“OB_ERROR”)
[2025-08-29 16:47:44.924881] ERROR [SERVER] start (ob_server.cpp:1165) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=78][errcode=
-4000] [server_start 9/18] observer instance start fail. you may find solutions in previous error logs or seek help from official technicians.
[2025-08-29 16:47:44.958576] ERROR [SERVER] start (ob_server.cpp:1239) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=84][errcode=
-4000] [server_start 10/18] observer start fail, the stop status is true. you may find solutions in previous error logs or seek help from offici
al technicians.
[admin@localhost log]$ grep YB420A100B06-00063D7D134883DF-0-0 election.log
[admin@localhost log]$ grep YB420A100B06-00063D7D134883DF-0-0 rootservice.log
[admin@localhost log]$ grep YB420A100B06-00063D7D134883DF-0-0 observer.log.wf
[2025-08-29 16:47:44.924744] ERROR [SERVER] start (ob_service.cpp:274) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=62][errcode=-4395] observice start process has failure(msg=“observice start() has failure”, ret=-4000, ret=“OB_ERROR”)
[2025-08-29 16:47:44.924881] ERROR [SERVER] start (ob_server.cpp:1165) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=78][errcode=-4000] [server_start 9/18] observer instance start fail. you may find solutions in previous error logs or seek help from official technicians.
[2025-08-29 16:47:44.958576] ERROR [SERVER] start (ob_server.cpp:1239) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=84][errcode=-4000] [server_start 10/18] observer start fail, the stop status is true. you may find solutions in previous error logs or seek help from official technicians.
[admin@localhost log]$ grep YB420A100B06-00063D7D134883DF-0-0 observer.log|more
[2025-08-29 16:47:44.922312] INFO [SERVER] register_self_busy_wait (ob_lease_state_mgr.cpp:145) [15590][observer][T0][YB420A100B06-00063D7D1348
83DF-0-0] [lt=33] begin register_self_busy_wait
[2025-08-29 16:47:44.923427] INFO [SHARE.SCHEMA] update_baseline_schema_version (ob_schema_store.cpp:86) [15590][observer][T0][YB420A100B06-000
63D7D134883DF-0-0] [lt=55] [SCHEMA_STORE] schema store update version(tenant_id=1, version=1744967712126008, baseline_schema_version=17449677121
26008)
[2025-08-29 16:47:44.923948] INFO [SERVER] do_renew_lease (ob_lease_state_mgr.cpp:387) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0]
[lt=1343] update baseline schema version(ret=0, ret=“OB_SUCCESS”, old_version=0, new_version=1744967712126008)
[2025-08-29 16:47:44.923988] INFO [SERVER] register_self_busy_wait (ob_lease_state_mgr.cpp:171) [15590][observer][T0][YB420A100B06-00063D7D1348
83DF-0-0] [lt=74] register self successfully!
[2025-08-29 16:47:44.924016] INFO [SERVER] register_self_busy_wait (ob_lease_state_mgr.cpp:183) [15590][observer][T0][YB420A100B06-00063D7D1348
83DF-0-0] [lt=42] end register_self_busy_wait
[2025-08-29 16:47:44.924047] EDIAG [SERVER] register_self (ob_service.cpp:243) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=62][
errcode=-4000] can’t renew lease, the time difference between local and RS may be more than 2s(ret=-4000, ret=“OB_ERROR”, heartbeat_expire_time=
0) BACKTRACE:0x9525fb6 0x90a6216 0x92ba863 0x92ba50c 0x92ba1ac 0x92b9f9a 0x119bc79d 0x119bc98f 0x11830188 0xdafae6c 0x248583b0 0xdaf60bd 0x7f33e
8826555 0x9832434
[2025-08-29 16:47:44.924626] EDIAG [SERVER] start (ob_service.cpp:265) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=1497][errcod
e=-4000] register self failed(ret=-4000, ret=“OB_ERROR”) BACKTRACE:0x9525fb6 0x90a6216 0x90a5779 0x90a5246 0x90a4ee4 0x90a4cdb 0x119bd166 0x119b
cbac 0x11830188 0xdafae6c 0x248583b0 0xdaf60bd 0x7f33e8826555 0x9832434
[2025-08-29 16:47:44.924712] INFO [SERVER] start (ob_service.cpp:272) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=225] [OBSERV
ICE_NOTICE] start ob_service end(ret=-4000, ret=“OB_ERROR”)
[2025-08-29 16:47:44.924744] ERROR [SERVER] start (ob_service.cpp:274) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=62][errcode=
-4395] observice start process has failure(msg=“observice start() has failure”, ret=-4000, ret=“OB_ERROR”)
[2025-08-29 16:47:44.924773] EDIAG [SERVER] start (ob_server.cpp:1054) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=61][errcode=
-4000] fail to start oceanbase service(ret=-4000, ret=“OB_ERROR”) BACKTRACE:0x9525fb6 0x90a6216 0x90a5779 0x90a5246 0x90a4ee4 0x90a4cdb 0x11837b
00 0x1183306f 0xdafae6c 0x248583b0 0xdaf60bd 0x7f33e8826555 0x9832434
[2025-08-29 16:47:44.924881] ERROR [SERVER] start (ob_server.cpp:1165) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=78][errcode=
-4000] [server_start 9/18] observer instance start fail. you may find solutions in previous error logs or seek help from official technicians.
[2025-08-29 16:47:44.958364] INFO destroy_tg (thread_mgr.cpp:89) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=51] destroy tg(tg
id=284, tg=0x7f33d8fd9cb0, tg->attr={name:StartupAccelHandler, type:4})
[2025-08-29 16:47:44.958441] EDIAG [SERVER] start (ob_server.cpp:1235) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=137][errcode
=-4000] failure occurs, try to set stop and wait(ret=-4000, ret=“OB_ERROR”) BACKTRACE:0x9525fb6 0x90a6216 0x90a5779 0x90a5246 0x90a4ee4 0x90a4cd
b 0x1183a106 0x11832df3 0xdafae6c 0x248583b0 0xdaf60bd 0x7f33e8826555 0x9832434
[2025-08-29 16:47:44.958576] ERROR [SERVER] start (ob_server.cpp:1239) [15590][observer][T0][YB420A100B06-00063D7D134883DF-0-0] [lt=84][errcode=
-4000] [server_start 10/18] observer start fail, the stop status is true. you may find solutions in previous error logs or seek help from offici
al technicians.
[admin@localhost log]$

2.尝试手工启动
[admin@localhost log]$ cd /home/admin/oceanbase && ./bin/observer

[admin@localhost log]$ more observer.log.wf
[2025-08-29 17:38:57.367749] INFO New syslog file info: [address: “10.16.11.6:2882”, observer version: OceanBase_CE 4.3.5.1, revision: 10100004
2025031818-b6d5706eb3d2c5f501c7fa646ddbf32f3dc87069, sysname: Linux, os release: 3.10.0-1160.119.1.el7.x86_64, machine: x86_64, tz GMT offset: 0
8:00]
[2025-08-29 17:39:06.582316] ERROR [SERVER] start (ob_service.cpp:274) [23024][observer][T0][YB420A100B06-00063D7DCAF4AF83-0-0] [lt=51][errcode=
-4395] observice start process has failure(msg=“observice start() has failure”, ret=-4000, ret=“OB_ERROR”)
[2025-08-29 17:39:06.582450] ERROR [SERVER] start (ob_server.cpp:1165) [23024][observer][T0][YB420A100B06-00063D7DCAF4AF83-0-0] [lt=78][errcode=
-4000] [server_start 9/18] observer instance start fail. you may find solutions in previous error logs or seek help from official technicians.
[2025-08-29 17:39:06.624539] ERROR [SERVER] start (ob_server.cpp:1239) [23024][observer][T0][YB420A100B06-00063D7DCAF4AF83-0-0] [lt=90][errcode=
-4000] [server_start 10/18] observer start fail, the stop status is true. you may find solutions in previous error logs or seek help from offici
al technicians.
[admin@localhost log]$ grep YB420A100B06-00063D7DCAF4AF83-0-0 observer.log|more
[2025-08-29 17:39:06.580726] INFO [SERVER] register_self_busy_wait (ob_lease_state_mgr.cpp:145) [23024][observer][T0][YB420A100B06-00063D7DCAF4
AF83-0-0] [lt=34] begin register_self_busy_wait
[2025-08-29 17:39:06.581803] INFO [SHARE.SCHEMA] update_baseline_schema_version (ob_schema_store.cpp:86) [23024][observer][T0][YB420A100B06-000
63D7DCAF4AF83-0-0] [lt=57] [SCHEMA_STORE] schema store update version(tenant_id=1, version=1744967712126008, baseline_schema_version=17449677121
26008)
[2025-08-29 17:39:06.581842] INFO [SERVER] do_renew_lease (ob_lease_state_mgr.cpp:387) [23024][observer][T0][YB420A100B06-00063D7DCAF4AF83-0-0]
[lt=82] update baseline schema version(ret=0, ret=“OB_SUCCESS”, old_version=0, new_version=1744967712126008)
[2025-08-29 17:39:06.581884] INFO [SERVER] register_self_busy_wait (ob_lease_state_mgr.cpp:171) [23024][observer][T0][YB420A100B06-00063D7DCAF4
AF83-0-0] [lt=77] register self successfully!
[2025-08-29 17:39:06.581915] INFO [SERVER] register_self_busy_wait (ob_lease_state_mgr.cpp:183) [23024][observer][T0][YB420A100B06-00063D7DCAF4
AF83-0-0] [lt=53] end register_self_busy_wait
[2025-08-29 17:39:06.581941] EDIAG [SERVER] register_self (ob_service.cpp:243) [23024][observer][T0][YB420A100B06-00063D7DCAF4AF83-0-0] [lt=49][
errcode=-4000] can’t renew lease, the time difference between local and RS may be more than 2s(ret=-4000, ret=“OB_ERROR”, heartbeat_expire_time=
0) BACKTRACE:0x9525fb6 0x90a6216 0x92ba863 0x92ba50c 0x92ba1ac 0x92b9f9a 0x119bc79d 0x119bc98f 0x11830188 0xdafae6c 0x248583b0 0xdaf60bd 0x7fc31
9d8f555 0x9832434
[2025-08-29 17:39:06.582185] EDIAG [SERVER] start (ob_service.cpp:265) [23024][observer][T0][YB420A100B06-00063D7DCAF4AF83-0-0] [lt=621][errcode
=-4000] register self failed(ret=-4000, ret=“OB_ERROR”) BACKTRACE:0x9525fb6 0x90a6216 0x90a5779 0x90a5246 0x90a4ee4 0x90a4cdb 0x119bd166 0x119bc
bac 0x11830188 0xdafae6c 0x248583b0 0xdaf60bd 0x7fc319d8f555 0x9832434
[2025-08-29 17:39:06.582288] INFO [SERVER] start (ob_service.cpp:272) [23024][observer][T0][YB420A100B06-00063D7DCAF4AF83-0-0] [lt=265] [OBSERV
ICE_NOTICE] start ob_service end(ret=-4000, ret=“OB_ERROR”)
[2025-08-29 17:39:06.582316] ERROR [SERVER] start (ob_service.cpp:274) [23024][observer][T0][YB420A100B06-00063D7DCAF4AF83-0-0] [lt=51][errcode=
-4395] observice start process has failure(msg=“observice start() has failure”, ret=-4000, ret=“OB_ERROR”)
[2025-08-29 17:39:06.582341] EDIAG [SERVER] start (ob_server.cpp:1054) [23024][observer][T0][YB420A100B06-00063D7DCAF4AF83-0-0] [lt=50][errcode=
-4000] fail to start oceanbase service(ret=-4000, ret=“OB_ERROR”) BACKTRACE:0x9525fb6 0x90a6216 0x90a5779 0x90a5246 0x90a4ee4 0x90a4cdb 0x11837b
00 0x1183306f 0xdafae6c 0x248583b0 0xdaf60bd 0x7fc319d8f555 0x9832434
[2025-08-29 17:39:06.582450] ERROR [SERVER] start (ob_server.cpp:1165) [23024][observer][T0][YB420A100B06-00063D7DCAF4AF83-0-0] [lt=78][errcode=
-4000] [server_start 9/18] observer instance start fail. you may find solutions in previous error logs or seek help from official technicians.
[2025-08-29 17:39:06.624319] INFO destroy_tg (thread_mgr.cpp:89) [23024][observer][T0][YB420A100B06-00063D7DCAF4AF83-0-0] [lt=50] destroy tg(tg
id=284, tg=0x7fc30a4ddcb0, tg->attr={name:StartupAccelHandler, type:4})
[2025-08-29 17:39:06.624390] EDIAG [SERVER] start (ob_server.cpp:1235) [23024][observer][T0][YB420A100B06-00063D7DCAF4AF83-0-0] [lt=136][errcode
=-4000] failure occurs, try to set stop and wait(ret=-4000, ret=“OB_ERROR”) BACKTRACE:0x9525fb6 0x90a6216 0x90a5779 0x90a5246 0x90a4ee4 0x90a4cd
b 0x1183a106 0x11832df3 0xdafae6c 0x248583b0 0xdaf60bd 0x7fc319d8f555 0x9832434
[2025-08-29 17:39:06.624539] ERROR [SERVER] start (ob_server.cpp:1239) [23024][observer][T0][YB420A100B06-00063D7DCAF4AF83-0-0] [lt=90][errcode=
-4000] [server_start 10/18] observer start fail, the stop status is true. you may find solutions in previous error logs or seek help from offici
al technicians.
[admin@localhost log]$

看下这台机器和其它时间的时间差异多大

1 个赞

1.旭辉老师威武,确实 10.xx.xx.6 这台机器的ntp服务出了一些问题时间上比其它机器快了40多秒,时间修正好后,10.xx.xx.6 参数恢复正常,重启server正常启动,旭辉老师太牛了感谢感谢,
obclient(root@(none))[oceanbase]> show parameters like ‘%offline%’\G
*************************** 1. row ***************************
zone: zone5
svr_type: observer
svr_ip: 10.xx.xx.6
svr_port: 2882
name: server_permanent_offline_time
data_type: TIME
value: 72h
info: the time interval between any two heartbeats beyond which a server is considered to be ‘permanently’ offline. Range: [20s,+∞)
section: ROOT_SERVICE
scope: CLUSTER
source: DEFAULT
edit_level: DYNAMIC_EFFECTIVE
default_value: 3600s
isdefault: 0

2.后续正常升级从4.3.5.1 升级到 4.3.5.3出现了点小问题如下图


不过集群版本到是升上来了如下图:
→ select @@version;
±-----------------------------+
| @@version |
±-----------------------------+
| 5.7.25-OceanBase_CE-v4.3.5.3 |
±-----------------------------+
1 row in set (0.001 sec)

obclient(root@(none))[oceanbase]>

错误信息如下:
log_task_1038701.zip (21.2 KB)

Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1; statement executed: delete from compute_host_service where id=?

2025-08-29 19:27:47.696 ERROR 13042 — [manual-subtask-executor13,11276667644db7ad,7db7cdde7aa7d9bc] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1; statement executed: delete from compute_host_service where id=?

org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1; statement executed: delete from compute_host_service where id=?

at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:67)

at org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:54)

at org.hibernate.engine.jdbc.batch.internal.NonBatchingBatch.addToBatch(NonBatchingBatch.java:47)

at org.hibernate.persister.entity.AbstractEntityPersister.delete(AbstractEntityPersister.java:3698)

at org.hibernate.persister.entity.AbstractEntityPersister.delete(AbstractEntityPersister.java:3987)

at org.hibernate.action.internal.EntityDeleteAction.execute(EntityDeleteAction.java:123)

at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:604)

at org.hibernate.engine.spi.ActionQueue.lambda$executeActions$1(ActionQueue.java:478)

at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684)

at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:475)

at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:344)

at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:40)

at org.hibernate.event.service.internal.EventListenerGroupImpl.fireEventOnEachListener(EventListenerGroupImpl.java:107)

at org.hibernate.internal.SessionImpl.doFlush(SessionImpl.java:1407)

at org.hibernate.internal.SessionImpl.flush(SessionImpl.java:1394)

at sun.reflect.GeneratedMethodAccessor224.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invok

e(Method.java:498)

at org.springframework.orm.jpa.SharedEntityManagerCreator$SharedEntityManagerInvocationHandler.invoke(SharedEntityManagerCreator.java:315)

at com.sun.proxy.$Proxy296.flush(Unknown Source)

at org.springframework.data.jpa.repository.support.SimpleJpaRepository.flush(SimpleJpaRepository.java:727)

at org.springframework.data.jpa.repository.support.SimpleJpaRepository.saveAndFlush(SimpleJpaRepository.java:682)

at sun.reflect.GeneratedMethodAccessor222.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.springframework.data.repository.core.support.RepositoryMethodInvoker$RepositoryFragmentMethodInvoker.lambda$new$0(RepositoryMethodInvoker.java:289)

at org.springframework.data.repository.core.support.RepositoryMethodInvoker.doInvoke(RepositoryMethodInvoker.java:137)

at org.springframework.data.repository.core.support.RepositoryMethodInvoker.invoke(RepositoryMethodInvoker.java:121)

at org.springframework.data.repository.core.support.RepositoryComposition$RepositoryFragments.invoke(RepositoryComposition.java:530)

at org.springframework.data.repository.core.support.RepositoryComposition.invoke(RepositoryComposition.java:286)

at org.springframework.data.repository.core.support.RepositoryFactorySupport$ImplementationMethodExecutionInterceptor.invoke(RepositoryFactorySupport.java:640)

at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)

at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.doInvoke(QueryExecutorMethodInterceptor.java:164)

at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.invoke(QueryExecutorMethodInterceptor.java:139)

at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)

at org.springframework.data.projection.DefaultMethodInvokingMethodInterceptor.invoke(DefaultMethodInvokingMethodInter

ceptor.java:81)

at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)

at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123)

at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388)

at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119)

at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)

at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:137)

at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)

at org.springframework.data.jpa.repository.support.CrudMethodMetadataPostProcessor$CrudMethodMetadataPopulatingMethodInterceptor.invoke(CrudMethodMetadataPostProcessor.java:174)

at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)

at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)

at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)

at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:241)

at com.sun.proxy.$Proxy346.saveAndFlush(Unknown Source)

at com.oceanbase.ocp.compute.host.manager.HostServiceManagerImpl.removeServiceFromHost(HostServiceManagerImpl.java:310)

at com.oceanbase.ocp.compute.host.manager.HostServiceManagerImpl.unReserveHost(HostServiceManagerImpl.java:134)

at com.oceanbase.ocp.compute.host.manager.HostServiceManagerImpl$$FastClassBySpringCGLIB$$a449d072.invoke()

at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)

at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopPr

oxy.java:792)

at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)

at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:762)

at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123)

at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388)

at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119)

at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)

at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:762)

at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:707)

at com.oceanbase.ocp.compute.host.manager.HostServiceManagerImpl$$EnhancerBySpringCGLIB$$e969accd.unReserveHost()

at com.oceanbase.ocp.service.task.business.host.ReserveHostTask.rollback(ReserveHostTask.java:98)

at com.oceanbase.ocp.core.task.runtime.Subtask.retry(Subtask.java:49)

at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.retry(JavaSubtaskRunner.java:76)

at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:35)

at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26)

at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:76)

at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.doRun(SubtaskExecutor.java:207)

at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.redirectConsoleOutput(SubtaskExecutor.java:201)

at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.lambda$submit$2(SubtaskExecutor.java:137)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.co

ncurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:750)

Set state for subtask: 1038795, operation:RETRY, state: FAILED

没遇到过这种报错