alter system archivelog 总是中断 ,无法备份

【 使用环境 】测试环境
【 OB or 其他组件 】 OB
【 使用版本 】 3.1.3-OceanBase CE
【问题描述】alter system archivelog 总是中断 ,无法备份数据库,nfs挂载正常。
MySQL [oceanbase]> select incarnation, tenant_id, status, min_first_time, max_next_time, now() from CDB_OB_BACKUP_ARCHIVELOG_SUMMARY;
MySQL [oceanbase]> select incarnation, tenant_id, status, min_first_time, max_next_time, now() from CDB_OB_BACKUP_ARCHIVELOG_SUMMARY;
±------------±----------±------------±---------------------------±---------------------------±--------------------+
| incarnation | tenant_id | status | min_first_time | max_next_time | now() |
±------------±----------±------------±---------------------------±---------------------------±--------------------+
| 1 | 1 | STOP | 2023-02-09 15:48:14.563149 | | 2023-02-10 14:21:02 |
| 1 | 1 | INTERRUPTED | 2023-02-10 14:20:32.824404 | | 2023-02-10 14:21:02 |
| 1 | 1004 | INTERRUPTED | 2022-09-14 14:36:38.456558 | | 2023-02-10 14:21:02 |
| 1 | 1005 | INTERRUPTED | 2022-09-19 14:33:24.361397 | | 2023-02-10 14:21:02 |
±------------±----------±------------±---------------------------±---------------------------±--------------------+
42 rows in set (0.263 sec)
【复现路径】问题出现前后相关操作
【问题现象及影响】
无法开启archivelog ,无法alter system backup database;
[2023-02-10 14:19:20.165662] WARN [RS.LB] generate_tenant_array (ob_server_balancer.cpp:1052) [6849][196][YB420A000E18-0005F1D3181F4606] [lt=7] [dc=0] do not got remote ttg string(zone=zone4)
[2023-02-10 14:19:20.165707] WARN [RS.LB] generate_tenant_array (ob_server_balancer.cpp:1052) [6849][196][YB420A000E18-0005F1D3181F4606] [lt=19] [dc=0] do not got remote ttg string(zone=zone5)
[2023-02-10 14:19:20.385706] ERROR [RS] do_schedule_ (ob_log_archive_scheduler.cpp:1187) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=79] [dc=0] [LOG_ARCHIVE] log archive status is interrupted, need manual process(sys_info={status:{tenant_id:1, copy_id:0, start_ts:1675928894563149, checkpoint_ts:0, status:5, incarnation:1, round:15, status_str:“INTERRUPTED”, is_mark_deleted:false, is_mount_file_created:true, compatible:1, backup_piece_id:0, start_piece_id:0}, backup_dest:“file:///data/nfs/backup”}) BACKTRACE:0x99be10e 0x9793491 0x2273374 0x2272e5b 0x2272bc1 0x3829e5c 0x7033165 0x7031e3b 0x66bca22 0x994e80d 0x994e23e 0x3385bef 0x2c70eb2 0x9746965 0x9745692 0x974214f
[2023-02-10 14:19:28.439787] WARN handle_start_log_archive_ (ob_log_archive_scheduler.cpp:290) [7736][1141][Y10B420A000E19-0005F3AF32C8C9DC] [lt=484] [dc=0] start log archive backup when not STOP is not supported
[2023-02-10 14:19:28.439817] WARN [RS] handle_start_log_archive_ (ob_log_archive_scheduler.cpp:291) [7736][1141][Y10B420A000E19-0005F3AF32C8C9DC] [lt=20] [dc=0] cannot start log archive backup in this status(ret=-4007, sys_info={status:{tenant_id:1, copy_id:0, start_ts:1675928894563149, checkpoint_ts:0, status:5, incarnation:1, round:15, status_str:“INTERRUPTED”, is_mark_deleted:false, is_mount_file_created:true, compatible:1, backup_piece_id:0, start_piece_id:0}, backup_dest:“file:///data/nfs/backup”})
[2023-02-10 14:19:28.440366] WARN [RS] handle_enable_log_archive (ob_log_archive_scheduler.cpp:188) [7736][1141][Y10B420A000E19-0005F3AF32C8C9DC] [lt=6] [dc=0] failed to handle start log archive(ret=-4007)
[2023-02-10 14:19:28.440377] WARN [RS] handle_archive_log (ob_root_service.cpp:10672) [7736][1141][Y10B420A000E19-0005F3AF32C8C9DC] [lt=8] [dc=0] failed to handle_enable_log_archive(ret=-4007, arg={enable:true})
[2023-02-10 14:19:28.440381] WARN [RS] process_ (ob_rs_rpc_processor.h:182) [7736][1141][Y10B420A000E19-0005F3AF32C8C9DC] [lt=4] [dc=0] process failed(ret=-4007)
[2023-02-10 14:19:29.576710] ERROR [RS] do_schedule_ (ob_log_archive_scheduler.cpp:1187) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=58] [dc=0] [LOG_ARCHIVE] log archive status is interrupted, need manual process(sys_info={status:{tenant_id:1, copy_id:0, start_ts:1675928894563149, checkpoint_ts:0, status:5, incarnation:1, round:15, status_str:“INTERRUPTED”, is_mark_deleted:false, is_mount_file_created:true, compatible:1, backup_piece_id:0, start_piece_id:0}, backup_dest:“file:///data/nfs/backup”}) BACKTRACE:0x99be10e 0x9793491 0x2273374 0x2272e5b 0x2272bc1 0x3829e5c 0x7033165 0x7031e3b 0x66bca22 0x994e80d 0x994e23e 0x3385bef 0x2c70eb2 0x9746965 0x9745692 0x974214f
[2023-02-10 14:19:39.825322] WARN [STORAGE] get_row (ob_micro_block_reader.cpp:58) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=343] [dc=0] failed to locate row, (ret=-4183, rowkey={“BIGINT”:1004},{“BIGINT”:1},{“BIGINT”:15})
[2023-02-10 14:19:41.392260] WARN [RS] wait (ob_rs_async_rpc_proxy.h:359) [6869][236][YB420A000E18-0005F1D2A4FFF049] [lt=3139] [dc=0] execute rpc failed(rc=-4012, server=“10.0.14.23:2882”, timeout=500000, packet code=2320, arg={keys:[{tid:1099511628094, partition_id:0, part_cnt:0}]})
[2023-02-10 14:19:41.392296] WARN [RS] wait (ob_rs_async_rpc_proxy.h:359) [6869][236][YB420A000E18-0005F1D2A4FFF049] [lt=27] [dc=0] execute rpc failed(rc=-4012, server=“10.0.14.24:2882”, timeout=500000, packet code=2320, arg={keys:[{tid:1099511628094, partition_id:0, part_cnt:0}]})
[2023-02-10 14:19:41.691042] WARN [STORAGE] get_row (ob_micro_block_reader.cpp:58) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=27] [dc=0] failed to locate row, (ret=-4183, rowkey={“BIGINT”:1},{“BIGINT”:1},{“BIGINT”:15})
[2023-02-10 14:20:32.856036] WARN [STORAGE] get_row (ob_micro_block_reader.cpp:58) [7736][1141][Y10B420A000E16-0005F1D05ED8AEA1] [lt=10] [dc=0] failed to locate row, (ret=-4183, rowkey={“BIGINT”:1},{“BIGINT”:1},{“BIGINT”:16},{“BIGINT”:0},{“BIGINT”:0})
[2023-02-10 14:20:33.000199] WARN [STORAGE.TRANS] wait_trans_version (ob_memtable_context.cpp:446) [7736][1141][Y10B420A000E16-0005F1D05ED8AEA1] [lt=39] [dc=0] lock_for_read need retry(ret=-6004, *this=alloc_type=0 ctx_descriptor=111589 trans_start_time=1676010032996763 min_table_version=0 max_table_version=1600 is_safe_read=false has_read_relocated_row=false read_snapshot=1676010032954288 start_version=-1 trans_version=9223372036854775807 commit_version=0 stmt_start_time=1676010032996763 abs_expired_time=1676010132701707 stmt_timeout=99704944 abs_lock_wait_timeout=0 row_purge_version=0 lock_wait_start_ts=0 trx_lock_timeout=-1 end_code=0 is_readonly=true ref=0 pkey= trans_id= data_relocated=0 relocate_cnt=0 truncate_cnt=0 trans_mem_total_size=0 callback_alloc_count=0 callback_free_count=0 callback_mem_used=0 checksum_log_ts=0, *mem_ctx=alloc_type=0 ctx_descriptor=5861754 trans_start_time=1676010032806752 min_table_version=1600 max_table_version=1600 is_safe_read=false has_read_relocated_row=false read_snapshot=1676010032257239 start_version=-1 trans_version=1676010032858473 commit_version=0 stmt_start_time=1676010032826838 abs_expired_time=1676010132671710 stmt_timeout=99844872 abs_lock_wait_timeout=1676010132671710 row_purge_version=0 lock_wait_start_ts=0 trx_lock_timeout=-1 end_code=0 is_readonly=false ref=0 pkey={tid:1099511628094, partition_id:0, part_cnt:0} trans_id={hash:15630334315637915203, inc:657061815, addr:“10.0.14.24:2882”, t:1676010032801282} data_relocated=0 relocate_cnt=0 truncate_cnt=0 trans_mem_total_size=171 callback_alloc_count=4 callback_free_count=0 callback_mem_used=890 checksum_log_ts=9223372036854775807, version=1676010032954288)
[2023-02-10 14:20:35.341548] WARN parse_log_archive_status_result_ (ob_log_archive_backup_info_mgr.cpp:849) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=12] [dc=0] tenant log archive status info not exist(ret=-4018)
[2023-02-10 14:20:35.341593] WARN [SHARE] get_log_archive_backup_info_v2_ (ob_log_archive_backup_info_mgr.cpp:632) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=27] [dc=0] failed to get_log_archive_status_(ret=-4018, sql=select backup_dest, tenant_id, incarnation, log_archive_round, time_to_usec(min_first_time) min_first_time, time_to_usec(max_next_time) max_next_time, status , is_mount_file_created, compatible, backup_piece_id, start_piece_id from all_backup_log_archive_status_v2 where tenant_id=1004 , info={status:{tenant_id:18446744073709551615, copy_id:0, start_ts:0, checkpoint_ts:0, status:8, incarnation:0, round:0, status_str:“UNKNOWN”, is_mark_deleted:false, is_mount_file_created:false, compatible:0, backup_piece_id:0, start_piece_id:0}, backup_dest:""})
[2023-02-10 14:20:35.341737] WARN [SHARE] get_log_archive_backup_info (ob_log_archive_backup_info_mgr.cpp:490) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=8] [dc=0] failed to get_log_archive_backup_info_v2(ret=-4018, tenant_id=1004)
[2023-02-10 14:20:35.359858] WARN parse_log_archive_status_result
(ob_log_archive_backup_info_mgr.cpp:849) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=8] [dc=0] tenant log archive status info not exist(ret=-4018)
[2023-02-10 14:20:35.360968] WARN [SHARE] get_log_archive_backup_info_v2
(ob_log_archive_backup_info_mgr.cpp:632) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=1096] [dc=0] failed to get_log_archive_status_(ret=-4018, sql=select backup_dest, tenant_id, incarnation, log_archive_round, time_to_usec(min_first_time) min_first_time, time_to_usec(max_next_time) max_next_time, status , is_mount_file_created, compatible, backup_piece_id, start_piece_id from all_backup_log_archive_status_v2 where tenant_id=1004 for update, info={status:{tenant_id:18446744073709551615, copy_id:0, start_ts:0, checkpoint_ts:0, status:8, incarnation:0, round:0, status_str:“UNKNOWN”, is_mark_deleted:false, is_mount_file_created:false, compatible:0, backup_piece_id:0, start_piece_id:0}, backup_dest:""})
[2023-02-10 14:20:35.361019] WARN [SHARE] get_log_archive_backup_info (ob_log_archive_backup_info_mgr.cpp:490) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=8] [dc=0] failed to get_log_archive_backup_info_v2(ret=-4018, tenant_id=1004)
[2023-02-10 14:20:35.999379] WARN [STORAGE] get_row (ob_micro_block_reader.cpp:58) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=8] [dc=0] failed to locate row, (ret=-4183, rowkey={“BIGINT”:1},{“BIGINT”:1004},{“BIGINT”:16},{“BIGINT”:0},{“BIGINT”:0})
[2023-02-10 14:20:36.144068] WARN parse_log_archive_status_result
(ob_log_archive_backup_info_mgr.cpp:849) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=17] [dc=0] tenant log archive status info not exist(ret=-4018)
[2023-02-10 14:20:36.144423] WARN [SHARE] get_log_archive_backup_info_v2
(ob_log_archive_backup_info_mgr.cpp:632) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=331] [dc=0] failed to get_log_archive_status_(ret=-4018, sql=select backup_dest, tenant_id, incarnation, log_archive_round, time_to_usec(min_first_time) min_first_time, time_to_usec(max_next_time) max_next_time, status , is_mount_file_created, compatible, backup_piece_id, start_piece_id from all_backup_log_archive_status_v2 where tenant_id=1005 , info={status:{tenant_id:18446744073709551615, copy_id:0, start_ts:0, checkpoint_ts:0, status:8, incarnation:0, round:0, status_str:“UNKNOWN”, is_mark_deleted:false, is_mount_file_created:false, compatible:0, backup_piece_id:0, start_piece_id:0}, backup_dest:""})
[2023-02-10 14:20:36.144559] WARN [SHARE] get_log_archive_backup_info (ob_log_archive_backup_info_mgr.cpp:490) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=15] [dc=0] failed to get_log_archive_backup_info_v2(ret=-4018, tenant_id=1005)
[2023-02-10 14:20:36.149496] WARN parse_log_archive_status_result
(ob_log_archive_backup_info_mgr.cpp:849) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=11] [dc=0] tenant log archive status info not exist(ret=-4018)
[2023-02-10 14:20:36.149521] WARN [SHARE] get_log_archive_backup_info_v2
(ob_log_archive_backup_info_mgr.cpp:632) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=12] [dc=0] failed to get_log_archive_status_(ret=-4018, sql=select backup_dest, tenant_id, incarnation, log_archive_round, time_to_usec(min_first_time) min_first_time, time_to_usec(max_next_time) max_next_time, status , is_mount_file_created, compatible, backup_piece_id, start_piece_id from all_backup_log_archive_status_v2 where tenant_id=1005 for update, info={status:{tenant_id:18446744073709551615, copy_id:0, start_ts:0, checkpoint_ts:0, status:8, incarnation:0, round:0, status_str:“UNKNOWN”, is_mark_deleted:false, is_mount_file_created:false, compatible:0, backup_piece_id:0, start_piece_id:0}, backup_dest:""})
[2023-02-10 14:20:36.149551] WARN [SHARE] get_log_archive_backup_info (ob_log_archive_backup_info_mgr.cpp:490) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=4] [dc=0] failed to get_log_archive_backup_info_v2(ret=-4018, tenant_id=1005)
[2023-02-10 14:20:37.317126] WARN [RS] start_log_archive_backup
(ob_log_archive_scheduler.cpp:1282) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=9] [dc=0] [LOG_ARCHIVE] log archive status is not doing, waiting for start(ret=0, tenant_id=1004, *status={tenant_id:1004, incarnation:1, status:6, round:16, min_backup_piece_id:0, start_ts:0, checkpoint_ts:0, max_log_ts:0, status_str:“MIXED”})
[2023-02-10 14:20:37.317151] WARN [RS] start_log_archive_backup
(ob_log_archive_scheduler.cpp:1282) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=20] [dc=0] [LOG_ARCHIVE] log archive status is not doing, waiting for start(ret=0, tenant_id=1005, *status={tenant_id:1005, incarnation:1, status:6, round:16, min_backup_piece_id:0, start_ts:0, checkpoint_ts:0, max_log_ts:0, status_str:“MIXED”})
[2023-02-10 14:20:37.317163] WARN [RS] start_log_archive_backup_ (ob_log_archive_scheduler.cpp:1309) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=9] [dc=0] [LOG_ARCHIVE] doing tenant count not match user tenant count, waiting for start(ret=-9028, doing_tenant_count=0, user_tenant_count=2, max_backup_start_ts=1676010032824404)
[2023-02-10 14:20:37.317191] WARN [RS] do_schedule_ (ob_log_archive_scheduler.cpp:1165) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=9] [dc=0] failed to start log archive backup(ret=-9028, sys_info={status:{tenant_id:1, copy_id:0, start_ts:1676010032824404, checkpoint_ts:0, status:2, incarnation:1, round:16, status_str:“BEGINNING”, is_mark_deleted:false, is_mount_file_created:true, compatible:1, backup_piece_id:0, start_piece_id:0}, backup_dest:“file:///data/nfs/backup”})
[2023-02-10 14:20:37.317199] WARN [RS] run3 (ob_log_archive_scheduler.cpp:739) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=5] [dc=0] failed to do schedule(tmp_ret=-9028)
[2023-02-10 14:20:48.291003] ERROR [RS] fetch_log_archive_backup_status_map_ (ob_log_archive_scheduler.cpp:3246) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=6] [dc=0] server log archive status is interrupted(ret=-9027, addr=“10.0.14.26:3882”, result={result_code:-9027, status_array:[]}) BACKTRACE:0x99be10e 0x9793491 0x22b00f6 0x22afd3b 0x22afab3 0x70e2744 0x703ee42 0x703b8f9 0x7033019 0x7031e3b 0x66bca22 0x994e80d 0x994e23e 0x3385bef 0x2c70eb2 0x9746965 0x9745692 0x974214f
[2023-02-10 14:20:48.291083] WARN [RS] start_log_archive_backup_ (ob_log_archive_scheduler.cpp:1269) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=75] [dc=0] failed to fetch_log_archive_backup_status_map(ret=-9027)
[2023-02-10 14:20:48.913625] WARN [RS] run3 (ob_log_archive_scheduler.cpp:739) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=6] [dc=0] failed to do schedule(tmp_ret=-9027)
[2023-02-10 14:20:56.700789] WARN [STORAGE.TRANS] submit_log_impl_ (ob_trans_part_ctx.cpp:6552) [6843][184][YB420A000E18-0005F1D3221FB2A0] [lt=16] [dc=0] submitting log costs too much time(ret=0, ret=“OB_SUCCESS”, timeguard=time guard ‘submit_log_impl’ cost too much time, used=34008, time_dist: 4,53,33939,4, trans_id={hash:12159330015125087789, inc:657062600, addr:“10.0.14.24:2882”, t:1676010056651187}, partition={tid:1099511627777, partition_id:0, part_cnt:1}, state={prepare_version:1676010056663412, state:0}, real_log_type=64, log_size=1830)
[2023-02-10 14:20:56.701090] WARN [LIB] ~ObTimeGuard (utility.h:759) [6843][184][YB420A000E18-0005F1D3221FB2A0] [lt=294] [dc=0] destruct(*this=time guard ‘submit_log_impl’ cost too much time, used=34325, time_dist: 4,53,33939,4)
[2023-02-10 14:21:02.047659] ERROR [RS] do_schedule_ (ob_log_archive_scheduler.cpp:1187) [6855][208][YB420A000E18-0005F1D063CF29E5] [lt=22] [dc=0] [LOG_ARCHIVE] log archive status is interrupted, need manual process(sys_info={status:{tenant_id:1, copy_id:0, start_ts:1676010032824404, checkpoint_ts:0, status:5, incarnation:1, round:16, status_str:“INTERRUPTED”, is_mark_deleted:false, is_mount_file_created:true, compatible:1, backup_piece_id:0, start_piece_id:0}, backup_dest:“file:///data/nfs/backup”}) BACKTRACE:0x99be10e 0x9793491 0x2273374 0x2272e5b 0x2272bc1 0x3829e5c 0x7033165 0x7031e3b 0x66bca22 0x994e80d 0x994e23e 0x3385bef 0x2c70eb2 0x9746965 0x9745692 0x974214f
[2023-02-10 14:21:10.323630] WARN [CLOG] finish_timer (ob_log_timer_utility.h:42) [6843][184][YB420A000E18-0005F1D3221FB2AC] [lt=8] [dc=0] cost too much time(file="/data/jenkins-slave1/workspace/rpm-oceanbase-ce-3.1.3-1.1.1@2/ob_source_code_dir/src/clog/ob_batch_buffer.cpp", line=341, time=20559)
[2023-02-10 14:21:12.706040] WARN [CLOG] finish_timer (ob_log_timer_utility.h:42) [6843][184][YB420A000E18-0005F1D3221FB2AE] [lt=998] [dc=0] cost too much time(file="/data/jenkins-slave1/workspace/rpm-oceanbase-ce-3.1.3-1.1.1@2/ob_source_code_dir/src/clog/ob_batch_buffer.cpp", line=341, time=11497

【附件】


根据这个日志断流排查下

您好!排查断流要看observer日志,您可以在observer日志中grep mark_fatal_error来看具体的原因,或者您可以将日志贴在这里,到时才能确定具体原因。