【 使用环境 】生产环境
【 OB or 其他组件 】OB
【 使用版本 】4.3.5
【问题描述】在执行备份的过程中,前面的步骤都执行成功了,但最后一步Calculate data set size执行超时。备份目录使用的NFS。
怀疑是不是跟nfs挂载有关?看了下之前挂载没有用官方推荐的参数设置。挂载命令如下:
mount -t nfs -o nfsvers=3,soft,noresvport 192.168.201.6:/nfs/obbackup /obbackup
【复现路径】问题出现前后相关操作
【附件及日志】
2026-02-01 04:38:14.591 INFO 4813 --- [subtask-executor31,0a3bbcbbcfecbef2,b4951724d130b08b] c.o.ocp.common.lang.pattern.Retry : wait for 30 seconds
……
2026-02-01 06:02:41.575 WARN 4813 --- [subtask-executor28,41009db0ec3f00f3,2abbfca647dd4fd4] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : Receive timeout callback, id=13016088, name=Calculate data set size, elapsed=10800, timeout=10800
2026-02-01 06:02:41.586 ERROR 4813 --- [subtask-executor31,0a3bbcbbcfecbef2,b4951724d130b08b] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : interrupted, msg:sleep interrupted
java.lang.RuntimeException: interrupted, msg:sleep interrupted
at com.oceanbase.ocp.common.lang.pattern.Retry.waitFor(Retry.java:194)
at com.oceanbase.ocp.common.lang.pattern.Retry.executeUntilWithTimeout(Retry.java:114)
at com.oceanbase.ocp.common.lang.pattern.Retry.executeUntilWithTimeout(Retry.java:98)
at com.oceanbase.ocp.backup.internal.monitor.CapacityComputerManager.collectClusterDataSetSize(CapacityComputerManager.java:112)
at com.oceanbase.ocp.backup.internal.monitor.CapacityComputerManager.calculateClusterDataSetSize(CapacityComputerManager.java:126)
at com.oceanbase.ocp.backup.internal.task.schedule.CalculateDataSetSizeTask.lambda$run$2(CalculateDataSetSizeTask.java:47)
at java.util.HashMap.forEach(HashMap.java:1290)
at com.oceanbase.ocp.backup.internal.task.schedule.CalculateDataSetSizeTask.run(CalculateDataSetSizeTask.java:42)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.execute(JavaSubtaskRunner.java:64)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:32)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26)
at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:76)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.doRun(SubtaskExecutor.java:206)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.redirectConsoleOutput(SubtaskExecutor.java:200)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.lambda$submit$2(SubtaskExecutor.java:137)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
ocp_agent(observer机器)30秒执行一次 du -sk <备份路径> | awk ‘{print $1}’ 来计算NFS上的备份文件大小,循环了 10080秒(3小时)未计算出结果,超时,首先怀疑存储性能有问题