ob租户数据增量备份失败

【 使用环境 】生产环境
【 OB or 其他组件 】
【 使用版本 】4.2.1.3
【问题描述】某个租户数据备份全量成功,但是增量备份失败,其他同版本集群全量和增量都是成功的,备份时间一直从凌晨4点30到16:30,然后报失败,此问题在此租户频繁会出现,之前偶尔增量会成功,此租户备份保留时间为181天,其他集群备份保留时间为90天

rootservice.log的报错信息如下:
[2025-11-06 16:19:19.670183] WDIAG [RS] push_task (ob_backup_task_scheduler.cpp:149) [63140][T1001_BACKUP_SC][T1001][YB420A012B97-000641C19D44F327-0-0] [lt=17][errcode=-4017] fail to check unique task(ret=-4017, task={ObBackupScheduleTask:{task_key:{tenant_id:1002, job_id:10329, task_id:546, ls_id:1003, type:0}, trace_id:YB420A012B97-000641C19D44F317-0-0, dst:“10.1.43.152:2882”, status:{status:2}, optional_servers:[{server:“10.1.43.151:2882”, priority:0}, {server:“10.1.43.152:2882”, priority:0}, {server:“10.1.43.153:2882”, priority:1}], generate_time:1762417159669609, schedule_time:0, executor_time:0}, incarnation_id:1, backup_set_id:436, backup_type:{type:2}, backup_date:20251106, ls_id:{id:1003}, turn_id:1, retry_id:0, start_scn:{val:1762374620115427218, v:0}, backup_user_ls_scn:{val:1762374635037692319, v:0}, end_scn:{val:1762375094204325319, v:0}, backup_path:“cos://db-backup-1252412222/data183/oceanbase/eeoob012-flowin/eeoob012/16413/tenant_incarnation_1/1002/data”, backup_status:{status:11}})
[2025-11-06 16:22:11.570461] WDIAG execute_list_callback (ob_storage_cos_base.cpp:82) [63142][T1001_BackupCle][T1001][YB420A012B97-000641C19D659BDB-0-0] [lt=6][errcode=-5331] fail to execute list callback!(ret=-5331, para.last_container_name_.d_name=backup_set_257_full/backup_set_257_full_20250509T043017_20250509T044201.obbak, para.last_container_name_.d_type=8, DT_REG=8, dname_size=256)
[2025-11-06 16:22:11.570519] WDIAG list_objects (ob_storage_cos_base.cpp:584) [63142][T1001_BackupCle][T1001][YB420A012B97-000641C19D659BDB-0-0] [lt=5][errcode=-5331] fail to list objects(ret=-5331, uri=cos://db-backup-1252412222/data183/oceanbase/eeoob012-flowin/eeoob012/16413/tenant_incarnation_1/1002/data)
[2025-11-06 16:22:11.570528] WDIAG list_files (ob_storage_cos_base.cpp:397) [63142][T1001_BackupCle][T1001][YB420A012B97-000641C19D659BDB-0-0] [lt=8][errcode=-5331] fail to list object in cos_base(ret=-5331, uri=cos://db-backup-1252412222/data183/oceanbase/eeoob012-flowin/eeoob012/16413/tenant_incarnation_1/1002/data, tmp_dir=data183/oceanbase/eeoob012-flowin/eeoob012/16413/tenant_incarnation_1/1002/data/)
[2025-11-06 16:22:11.570552] WDIAG [STORAGE] list_files (ob_storage.cpp:406) [63142][T1001_BackupCle][T1001][YB420A012B97-000641C19D659BDB-0-0] [lt=5][errcode=-5331] failed to list_files(ret=-5331, uri=cos://db-backup-1252412222/data183/oceanbase/eeoob012-flowin/eeoob012/16413/tenant_incarnation_1/1002/data)
[2025-11-06 16:22:11.570561] WDIAG scan_dir (ob_object_device.cpp:411) [63142][T1001_BackupCle][T1001][YB420A012B97-000641C19D659BDB-0-0] [lt=7][errcode=-5331] fail to do list/dir scan!(ret=-5331, is_dir_scan=false, dir_name=cos://db-backup-1252412222/data183/oceanbase/eeoob012-flowin/eeoob012/16413/tenant_incarnation_1/1002/data)
[2025-11-06 16:22:12.116340] WDIAG list_objects (ob_storage_cos_base.cpp:584) [63142][T1001_BackupCle][T1001][YB420A012B97-000641C19D659BDB-0-0] [lt=5][errcode=-5331] fail to list objects(ret=-5331, uri=cos://db-backup-1252412222/data183/oceanbase/eeoob012-flowin/eeoob012/16413/tenant_incarnation_1/1002/clog)
[2025-11-06 16:22:12.116349] WDIAG list_files (ob_storage_cos_base.cpp:397) [63142][T1001_BackupCle][T1001][YB420A012B97-000641C19D659BDB-0-0] [lt=8][errcode=-5331] fail to list object in cos_base(ret=-5331, uri=cos://db-backup-1252412222/data183/oceanbase/eeoob012-flowin/eeoob012/16413/tenant_incarnation_1/1002/clog, tmp_dir=data183/oceanbase/eeoob012-flowin/eeoob012/16413/tenant_incarnation_1/1002/clog/)
[2025-11-06 16:22:12.116374] WDIAG [STORAGE] list_files (ob_storage.cpp:406) [63142][T1001_BackupCle][T1001][YB420A012B97-000641C19D659BDB-0-0] [lt=7][errcode=-5331] failed to list_files(ret=-5331, uri=cos://db-backup-1252412222/data183/oceanbase/eeoob012-flowin/eeoob012/16413/tenant_incarnation_1/1002/clog)
[2025-11-06 16:22:12.116379] WDIAG scan_dir (ob_object_device.cpp:411) [63142][T1001_BackupCle][T1001][YB420A012B97-000641C19D659BDB-0-0] [lt=4][errcode=-5331] fail to do list/dir scan!(ret=-5331, is_dir_scan=false, dir_name=cos://db-backup-1252412222/data183/oceanbase/eeoob012-flowin/eeoob012/16413/tenant_incarnation_1/1002/clog)
[2025-11-06 16:29:19.671185] WDIAG [RS] push_task (ob_backup_task_scheduler.cpp:149) [63140][T1001_BACKUP_SC][T1001][YB420A012B97-000641C19D44F327-0-0] [lt=20][errcode=-4017] fail to check unique task(ret=-4017, task={ObBackupScheduleTask:{task_key:{tenant_id:1002, job_id:10329, task_id:546, ls_id:1001, type:0}, trace_id:YB420A012B97-000641C19D44F315-0-0, dst:“10.1.43.153:2882”, status:{status:2}, optional_servers:[{server:“10.1.43.152:2882”, priority:0}, {server:“10.1.43.153:2882”, priority:0}, {server:“10.1.43.151:2882”, priority:1}], generate_time:1762417759669748, schedule_time:0, executor_time:0}, incarnation_id:1, backup_set_id:436, backup_type:{type:2}, backup_date:20251106, ls_id:{id:1001}, turn_id:1, retry_id:0, start_scn:{val:1762374620115427218, v:0}, backup_user_ls_scn:{val:1762374635037692319, v:0}, end_scn:{val:1762375094204325319, v:0}, backup_path:“cos://db-backup-1252412222/data183/oceanbase/eeoob012-flowin/eeoob012/16413/tenant_incarnation_1/1002/data?host=cos.ap-beijing.myqcloud.com”, backup_status:{status:11}})
[2025-11-06 16:29:19.671236] WDIAG [RS] push_task (ob_backup_task_scheduler.cpp:149) [63140][T1001_BACKUP_SC][T1001][YB420A012B97-000641C19D44F327-0-0] [lt=16][errcode=-4017] fail to check unique task(ret=-4017, task={ObBackupScheduleTask:{task_key:{tenant_id:1002, job_id:10329, task_id:546, ls_id:1003, type:0}, trace_id:YB420A012B97-000641C19D44F317-0-0, dst:“10.1.43.152:2882”, status:{status:2}, optional_servers:[{server:“10.1.43.151:2882”, priority:0}, {server:“10.1.43.152:2882”, priority:0}, {server:“10.1.43.153:2882”, priority:1}], generate_time:1762417759670691, schedule_time:0, executor_time:0}, incarnation_id:1, backup_set_id:436, backup_type:{type:2}, backup_date:20251106, ls_id:{id:1003}, turn_id:1, retry_id:0, start_scn:{val:1762374620115427218, v:0}, backup_user_ls_scn:{val:1762374635037692319, v:0}, end_scn:{val:1762375094204325319, v:0}, backup_path:“cos://db-backup-1252412222/data183/oceanbase/eeoob012-flowin/eeoob012/16413/tenant_incarnation_1/1002/data?host=cos.ap-beijing.myqcloud.com”, backup_status:{status:11}})

1 个赞

OCP配置的备份吗?发下OCP版本 以及报错截图 及报错时间前后5分钟的observer.log及rootservice.log,可以使用obdiag收集
https://www.oceanbase.com/docs/common-obdiag-cn-1000000004222805

1 个赞
  1. 看空间,看其他备份都成功,说明空间足够
  2. 看策略,主要是看做一次全备份与下一次全备份的周期,一般是一周到一个月。备份保留时间最多三周到三个月,所以180的保留时间,确实太长,预估一个全量备份与增量备份比较长,前后依赖较大,容易报错
1 个赞

学习一下

1 个赞

目前我怀疑是数据量太大了,数据有9T,全量备份的时候从4点半备份到了上午11点30,增量的时候备份更长,查看FINISH_TABLET_COUNT还备有备份完毕,看了并发默认为0,目前调整了4个,同时备份保留时间我改成了90天,等备份的时候我在观察一下,是否ok,其中备份是在ocp上配置的,ocp版本为版本号: 4.3.2-20241012145836

1 个赞

数据备份性能调优

ha_low_thread_score
控制数据备份的并发度,小规格租户(≤ 4C)建议设置为默认值 0,默认并发度为 2;大规格租户可以先设置为10,慢的话可以按需翻倍调整。性能测试建议直接调整到最大值100。

sys_bkgd_net_percentage
控制后台系统任务(其中包括备份恢复任务)可占用总网络带宽的百分比,默认是 60%。如果不影响前台任务情况下可以适当增大。

1 个赞

好的,多谢,我先调整观察一下,有结果及时同步大家

1 个赞

111