通过OCP对OB集群租户进行备份时,Wait log backup checkpoint总在log backup checkpoint卡住

【 使用环境 】测试环境
【 OB or 其他组件 】OB和OCP
【 使用版本 】OB:4.3.5;OCP:4.3.6
【问题描述】OCP服务器在最初搭建时忘记改时区了,使用的是UTC时区,但OB集群采用的是CST时区。通过OCP进行租户备份时,Wait log backup checkpoint总卡在log backup checkpoint并且失败,失败时的日志如下,请问失败原因是不是时区不一致导致的?如何纠正呢?
【复现路径】问题出现前后相关操作
【附件及日志】
2025-08-05 02:11:27.372 INFO 13873 — [manual-subtask-executor12,b659f5890bbe4356,aee269a2b70790d3] c.o.o.b.i.operation.BackupObOpsService : systemTenantZoneId=+08:00

2025-08-05 02:11:27.401 INFO 13873 — [manual-subtask-executor12,b659f5890bbe4356,aee269a2b70790d3] .o.o.b.i.t.s.WaitLogBackupCheckpointTask : log backup checkpoint=2025-08-05T02:10:11.412Z is before data backup min restore time=2025-08-05T10:01:19.057656Z

2025-08-05 02:11:27.404 INFO 13873 — [manual-subtask-executor12,b659f5890bbe4356,aee269a2b70790d3] c.o.ocp.common.lang.pattern.Retry : wait for 30 seconds

Set state for subtask: 5604, operation:EXECUTE, state: FAILED
2025-08-05 02:11:49.232 ERROR 13873 — [manual-subtask-executor12,b659f5890bbe4356,aee269a2b70790d3] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : interrupted, msg:sleep interrupted

java.lang.RuntimeException: interrupted, msg:sleep interrupted
at com.oceanbase.ocp.common.lang.pattern.Retry.waitFor(Retry.java:194)
at com.oceanbase.ocp.common.lang.pattern.Retry.executeUntilWithTimeout(Retry.java:114)
at com.oceanbase.ocp.common.lang.pattern.Retry.executeUntilWithTimeout(Retry.java:98)
at com.oceanbase.ocp.backup.internal.task.schedule.WaitLogBackupCheckpointTask.run(WaitLogBackupCheckpointTask.java:68)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.execute(JavaSubtaskRunner.java:64)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:32)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26)
at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:76)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.doRun(SubtaskExecutor.java:212)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.redirectConsoleOutput(SubtaskExecutor.java:206)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.lambda$submit$2(SubtaskExecutor.java:137)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

2025-08-05 02:11:49.232 WARN 13873 — [subtask-executor28,33d248d2d14e8c57,1ac151add08e1b48] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : Receive timeout callback, id=5604, name=Wait log backup checkpoint, elapsed=600, timeout=600

subtask_5604.log (114.5 KB)
这是OCP中下载的日志

sys租户执行这个查询看下

SELECT /*+ ocp_query */ * from (SELECT incarnation, round_id AS log_archive_round, tenant_id, path AS backup_dest, IF(start_scn_display != '', start_scn_display, NULL) AS min_first_time, IF(checkpoint_scn_display != '', checkpoint_scn_display, NULL) AS max_next_time, status, IF(checkpoint_scn != '', truncate((time_to_usec(now()) - checkpoint_scn / 1000) / 1000000, 4), NULL) AS delay, now(6) AS check_time FROM CDB_OB_ARCHIVELOG_SUMMARY RIGHT JOIN (SELECT tenant_id as _tenant_id, max(round_id) as _round_id FROM CDB_OB_ARCHIVELOG_SUMMARY GROUP BY _tenant_id ) AS t ON tenant_id = t._tenant_id and round_id = t._round_id)

刚修改了OCP服务的时区,又重装了一下OCP,现在好了。。。


好的,目前分析不到根因,如果还出现的话 保留下出问题时间点的observer.log 再分析

好的,谢谢旭辉老师~~

差了8小时,大概率是时区这个原因