【 使用环境 】生产环境
【 OB or 其他组件 】OCP
【 使用版本 】4.3.0-20240617185317
【问题描述】OCP接管集群后告警
observer.log:
[2024-08-15 16:49:19.517497] INFO [STORAGE.TRANS] replay (ob_id_service.cpp:210) [636658][T1004_ReplaySrv][T1004][Y0-0000000000000000-0-0] [lt=11] ObIDService replay(service_type_=0, ret=0, ls_log={last_id_:1723711768702618829, limit_id_:1723711768702618829}, base_header={version:1, log_type:7, flag:0, replay_hint:0})
[2024-08-15 16:49:19.517501] WDIAG [LIB] ~ObTimeGuard (utility.h:855) [636322][T1002_ReplaySrv][T1002][Y0-0000000000000000-0-0] [lt=33][errcode=-4389] destruct(*this=time guard ‘ls_meta’ cost too much time, used=4081409, time_dist: after lock=4081363, (null)=3)
[2024-08-15 16:49:19.517507] ERROR issue_dba_error (ob_log.cpp:1868) [636658][T1004_ReplaySrv][T1004][Y0-0000000000000000-0-0] [lt=7][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4389, file=“ob_ls_adapter.cpp”, line_no=106, info=“single replay task cost too much time. replay may be delayed”)
[2024-08-15 16:49:19.517511] INFO [STORAGE.TRANS] handle_replay_result (ob_id_service.cpp:237) [636322][T1002_ReplaySrv][T1002][Y0-0000000000000000-0-0] [lt=8] handle reply result(ret=0, service_type_=0, last_id=1723711767138154163, limited_id=1723711767138154163, log_ts={val:1723711750472305210, v:0})
[2024-08-15 16:49:19.517515] WDIAG [LIB] ~ObTimeGuard (utility.h:855) [636309][T1002_TxLoopWor][T1002][Y0-0000000000000000-0-0] [lt=28][errcode=-4389] destruct(*this=time guard ‘ls_meta’ cost too much time, used=1046556, time_dist: after lock=1046548)
[2024-08-15 16:49:19.517526] WDIAG [STORAGE.TRANS] for_each_remove_ (ob_tx_retain_ctx_mgr.cpp:386) [636309][T1002_TxLoopWor][T1002][Y0-0000000000000000-0-0] [lt=10][errcode=-4389] [RetainCtxMgr] remove retain ctx use too much time(use_ts=1046562, remove_count=0)
[2024-08-15 16:49:19.517523] INFO [STORAGE.TRANS] replay (ob_id_service.cpp:210) [636322][T1002_ReplaySrv][T1002][Y0-0000000000000000-0-0] [lt=11] ObIDService replay(service_type_=0, ret=0, ls_log={last_id_:1723711767138154163, limit_id_:1723711767138154163}, base_header={version:1, log_type:7, flag:0, replay_hint:0})
[2024-08-15 16:49:19.517508] INFO [STORAGE.TRANS] handle_tx_commit_result_ (ob_trans_service_v4.cpp:550) [636657][T1004_ApplySrv0][T1004][YB42C0A837A2-00061FB4891BE0DC-0-0] [lt=42] handle tx commit result(ret=0, ref_cnt_0=3, tx={this:0x7f35ecd2f960, tx_id:{txid:1041191}, state:10, addr:“192.168.55.162:2882”, tenant_id:1004, session_id:3221828347, assoc_session_id:3221828347, xid:NULL, xa_mode:"", xa_start_addr:“0.0.0.0:0”, access_mode:0, tx_consistency_type:0, isolation:1, snapshot_version:{val:18446744073709551615, v:3}, snapshot_scn:0, active_scn:1723711750854298, op_sn:4, alloc_ts:1723711720865542, active_ts:1723711750851488, commit_ts:1723711750851488, finish_ts:1723711759516551, timeout_us:86400000000, lock_timeout_us:-1, expire_ts:1723798150851488, coord_id:{id:1002}, parts:[{id:{id:1002}, addr:“192.168.55.162:2882”, epoch:1084893382459217, first_scn:1723711750854296, last_scn:1723711750854299, last_touch_ts:1723711750854299}], exec_info_reap_ts:1723711750854298, commit_version:{val:1723711750851302499, v:0}, commit_times:0, commit_cb:0x7f3613af5788, cluster_id:1723708822, cluster_version:17180000514, flags_.SHADOW:false, flags_.INTERRUPTED:false, flags_.BLOCK:false, flags_.REPLICA:false, can_elr:true, cflict_txs:[], abort_cause:0, commit_expire_ts:1723711760780639, commit_task_.is_registered():false, ref:3}, commit_fin=true, result=0)
[2024-08-15 16:49:19.517530] INFO [COMMON] advance_clock (ob_fifo_arena.cpp:374) [636321][T1002_ReplaySrv][T1002][Y0-0000000000000000-0-0] [lt=4] current clock is (clock_=0, max_seq_=0, mem_limit=888772996, cur_mem_hold=77594624, attr_.tenant_id_=1002)
[2024-08-15 16:49:19.517537] WDIAG [LIB] ~ObTimeGuard (utility.h:855) [636309][T1002_TxLoopWor][T1002][Y0-0000000000000000-0-0] [lt=10][errcode=-4389] destruct(*this=time guard ‘try_gc_retain_ctx’ cost too much time, used=1046576, time_dist: (null)=0)
[2024-08-15 16:49:19.517538] ERROR issue_dba_error (ob_log.cpp:1868) [636322][T1002_ReplaySrv][T1002][Y0-0000000000000000-0-0] [lt=12][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4389, file=“ob_ls_adapter.cpp”, line_no=106, info=“single replay task cost too much time. replay may be delayed”)
[2024-08-15 16:49:19.517552] INFO [STORAGE] ~ObStorageTableGuard (ob_storage_table_guard.cpp:153) [636321][T1002_ReplaySrv][T1002][Y0-0000000000000000-0-0] [lt=12] throttle statics(need_speed_limit=false, last_throttle_status=false, last_print_log_time=1723711744087363, stat={total_throttle_time_us:0, total_skip_throttle_time_us:0, last_log_timestamp:1723711759516551, last_throttle_status:false, 0=0, 1=0, 2=0, 3=0})
[2024-08-15 16:49:19.517545] INFO [STORAGE.TRANS] print_retain_ctx_info (ob_tx_retain_ctx_mgr.cpp:273) [636309][T1002_TxLoopWor][T1002][Y0-0000000000000000-0-0] [lt=6] [RetainCtxMgr] print retain ctx(ls_id={id:1}, this={retain_ctx_list_.get_size():274, max_wait_ckpt_ts_:{val:1723711692771255671, v:0}, last_push_gc_task_ts_:1723711098094878, skip_remove_cnt_:1}, retain_ctx_list_.get_first()={cause_:0, tx_id_:{txid:1006578}, ls_id_:{id:1}, tx_ctx_:0x7f36a629bdd0}, retain_ctx_list_.get_last()={cause_:0, tx_id_:{txid:1006874}, ls_id_:{id:1}, tx_ctx_:0x7f3637d2c050})
[2024-08-15 16:49:19.517542] TRACE [TRACE] reset (ob_trans_define_v4.cpp:461) [636657][T1004_ApplySrv0][T1004][YB42C0A837A2-00061FB4891BE0DC-0-0] [lt=28] [tx slow commit][tx desc trace](TRACE=begin_ts=1723711720865615 2024-08-15 08:48:40.865615|[reuse] u=0 ret:0, addr:0x7f35ecd2f960, txid:{txid:0}, tag1:0, tag2:1, ref:1, thread_id:636736|[create_global_implicit_savepoint] u=29985271 ret:0, txid:{txid:1041191}, savepoint:1723711750854288, release:true, opid:2, ref:2, thread_id:636736|[get_read_snapshot] u=601 ret:0, expire_ts:1723711760780639, txid:{txid:1041191}, isolation_level:1, snapshot_source:1, snapshot_version:{scn_val:1723711750850453995}, snapshot_txid:1041191, snapshot_scn:1723711750854293, trace_id:“YB42C0A837A2-00061FB4891BE0DF-0-0”, ref:2, thread_id:636736|[create_global_implicit_savepoint] u=46 ret:0, txid:{txid:1041191}, savepoint:1723711750854295, release:false, opid:3, ref:2, thread_id:636736|[submit_commit_tx] u=421 ret:0, expire_ts:1723711760780639, tag1:false, tag2:"", ref:2, thread_id:636736|[handle_tx_commit_result] u=8665582 ret:0, arg:0, is_finish:true, result:0, state:10, tag1:3, ref:3, commit_version:{scn_val:1723711750851302499}, thread_id:636657|[exec_commit_cb] u=2 arg:0x7f3613af5788, ref:3, thread_id:636657|total_timeu=38651923)
[2024-08-15 16:49:19.517606] INFO [LIB] stat (utility.h:1157) [636321][T1002_ReplaySrv][T1002][Y0-0000000000000000-0-0] [lt=13] [REPLAY STAT REPLAY TASK IN QUEUE TIME](cur_stat_count=1, stat_interval=1000000, avg (count/cost)=101, this=0x7f36a5a048f8, extra_info="")
[2024-08-15 16:49:19.517513] EDIAG [CLOG] replay (ob_ls_adapter.cpp:106) [636658][T1004_ReplaySrv][T1004][Y0-0000000000000000-0-0] [lt=6][errcode=-4389] single replay task cost too much time. replay may be delayed(replay_task={ls_id_:{id:1}, log_type:7, log_type:“TIMESTAMP”, lsn_:{lsn:33958539}, scn_:{val:1723711752147585641, v:0}, is_pre_barrier_:false, is_post_barrier_:false, log_size_:40, replay_hint_:0, is_raw_write_:false, first_handle_ts_:1723711756155303, replay_cost_:3362201, retry_cost_:0, log_buf_:0x7f3610a65f00}) BACKTRACE:0x113b8115 0x6ab8be9 0x4d6acd4 0x6ab8954 0x6ab0bf1 0x8757ba0 0x875702c 0x884dd18 0x4b13ab8 0x11a55a31 0x11a51fbf 0x7f37669281ca 0x7f37665838d3
[2024-08-15 16:49:19.517615] INFO [LIB] stat (utility.h:1157) [636321][T1002_ReplaySrv][T1002][Y0-0000000000000000-0-0] [lt=8] [REPLAY STAT REPLAY TASK EXECUTE COST TIME](cur_stat_count=1, stat_interval=1000000, avg (count/cost)=153, this=0x7f36a5a04b28, extra_info="")
[2024-08-15 16:49:19.517618] INFO [LIB] stat (utility.h:1157) [636658][T1004_ReplaySrv][T1004][Y0-0000000000000000-0-0] [lt=104] [REPLAY STAT REPLAY TASK IN QUEUE TIME](cur_stat_count=2, stat_interval=1000000, avg (count/cost)=8, this=0x7f36702048f8, extra_info="")
[2024-08-15 16:49:19.517625] INFO [LIB] stat (utility.h:1157) [636658][T1004_ReplaySrv][T1004][Y0-0000000000000000-0-0] [lt=6] [REPLAY STAT REPLAY TASK EXECUTE COST TIME](cur_stat_count=2, stat_interval=1000000, avg (count/cost)=1681292, this=0x7f3670204b28, extra_info="")
[2024-08-15 16:49:19.517549] EDIAG [CLOG] replay (ob_ls_adapter.cpp:106) [636322][T1002_ReplaySrv][T1002][Y0-0000000000000000-0-0] [lt=11][errcode=-4389] single replay task cost too much time. replay may be delayed(replay_task={ls_id_:{id:1}, log_type:7, log_type:“TIMESTAMP”, lsn_:{lsn:25254194}, scn_:{val:1723711750472305210, v:0}, is_pre_barrier_:false, is_post_barrier_:false, log_size_:40, replay_hint_:0, is_raw_write_:false, first_handle_ts_:1723711755436086, replay_cost_:4081449, retry_cost_:0, log_buf_:0x7f360a2a2450}) BACKTRACE:0x113b8115 0x6ab8be9 0x4d6acd4 0x6ab8954 0x6ab0bf1 0x8757ba0 0x875702c 0x884dd18 0x4b13ab8 0x11a55a31 0x11a51fbf 0x7f37669281ca 0x7f37665838d3
[2024-08-15 16:49:19.517610] TRACE [TRACE] print_trace_log_if_necessary_ (ob_trans_ctx.cpp:137) [636657][T1004_ApplySrv0][T1004][YB42C0A837A2-00061FB4891BE0DF-0-0] [lt=26] [long trans] (TRACE=begin_ts=1723711750851690 2024-08-15 08:49:10.851690|[init] u=0 addr:0x7f366ffc0dd0, id:1002, trans_id:{txid:1041191}, ref:0, opid:1055|[start_trans] u=2 ret:0, left_time:86400000000, ref:2, opid:1056|[start_access] u=1 ret:0, trace_id:“YB42C0A837A2-00061FB4891BE0DF-0-0”, opid:3, data_seq:1723711750854296, pending:1, ref:2, tid:1619|[end_access] u=157 opid:3, pending:0, ref:2, tid:1619|[start_access] u=4 ret:0, trace_id:“YB42C0A837A2-00061FB4891BE0DF-0-0”, opid:3, data_seq:1723711750854299, pending:1, ref:2, tid:1619|[end_access] u=35 opid:3, pending:0, ref:2, tid:1619|[set_stc] u=42 stc:1723711750851928, opid:1061|[commit] u=22 ret:0, tid:636736, ref:3, opid:1061|[get_gts_callback] u=6 ret:-4023, srr:1723711750851584, gts:{scn_val:1723711750851018706}, ref:2, opid:1062|[after_submit_log] u=231 ret:0, log_no:0, base_ts:{scn_val:1723711750851302499}, t:{scn_val:1723711750851302499}, lsn:{lsn:6786307}|[get_gts_callback] u=7 ret:0, srr:1723711750851936, gts:{scn_val:1723711750851302499}, ref:3, opid:1063|[log_sync_succ_cb] u=8665237 ret:0, log_type:0x1, t:{scn_val:1723711750851302499}, offset:{lsn:6786307}, ref:2|[log_sync_succ_cb] u=1 ret:0, log_type:0x40, t:{scn_val:1723711750851302499}, offset:{lsn:6786307}, ref:2|[response_scheduler] u=24 ret:0, tag1:false, tag2:false, status:0, commit_version:{scn_val:1723711750851302499}|[exiting] u=45 ref:2, arg1:3221828347, opid:1065|[log_sync_succ_cb] u=0 ret:0, log_type:0x100, t:{scn_val:1723711750851302499}, offset:{lsn:6786307}, ref:1|[end_trans_cb] u=103 ret:0, arg1:0, arg2:{scn_val:1723711750851302499}, async:false, opid:1066|[destroy] u=2 opid:1066|total_timeu=8665919)[2024-08-15 16:49:19.517622] INFO [STORAGE] ~ObStorageTableGuard (ob_storage_table_guard.cpp:153) [636719][T1004_ReplaySrv][T1004][Y0-0000000000000000-0-0] [lt=5] throttle statics(need_speed_limit=false, last_throttle_status=false, last_print_log_time=1723711713536291, stat={total_throttle_time_us:0, total_skip_throttle_time_us:0, last_log_timestamp:1723711759517608, last_throttle_status:false, 0=0, 1=0, 2=0, 3=0})
[2024-08-15 16:49:19.517642] INFO [LIB] stat (utility.h:1157) [636657][T1004_ApplySrv0][T1004][YB42C0A837A2-00061FB4891BE0DF-0-0] [lt=29] [APPLY STAT CB APPEND COST TIME](cur_stat_count=2, stat_interval=5000000, avg (count/cost)=13, this=0x7f364f2ca280, extra_info="")[2024-08-15 16:49:19.517647] INFO [LIB] stat (utility.h:1157) [636657][T1004_ApplySrv0][T1004][YB42C0A837A2-00061FB4891BE0DF-0-0] [lt=5] [APPLY STAT CB IN QUEUE TIME](cur_stat_count=2, stat_interval=5000000, avg (count/cost)=6667694, this=0x7f364f2ca4b0, extra_info="")
[2024-08-15 16:49:19.517651] INFO [LIB] stat (utility.h:1157) [636657][T1004_ApplySrv0][T1004][YB42C0A837A2-00061FB4891BE0DF-0-0] [lt=3] [APPLY STAT CB WAIT COMMIT TIME](cur_stat_count=2, stat_interval=5000000, avg (count/cost)=0, this=0x7f364f2ca6e0, extra_info="")[2024-08-15 16:49:19.517654] INFO [LIB] stat (utility.h:1157) [636657][T1004_ApplySrv0][T1004][YB42C0A837A2-00061FB4891BE0DF-0-0] [lt=3] [APPLY STAT CB EXECUTE TIME](cur_stat_count=2, stat_interval=5000000, avg (count/cost)=-71, this=0x7f364f2ca910, extra_info="")[2024-08-15 16:49:19.517657] INFO [LIB] stat (utility.h:1157) [636657][T1004_ApplySrv0][T1004][YB42C0A837A2-00061FB4891BE0DF-0-0] [lt=3] [APPLY STAT CB TOTAL TIME](cur_stat_count=2, stat_interval=5000000, avg (count/cost)=6667636, this=0x7f364f2cab40, extra_info="")[2024-08-15 16:49:19.517661] WDIAG [CLOG] statistics_cb_cost_ (ob_log_apply_service.cpp:933) [636657][T1004_ApplySrv0][T1004][YB42C0A837A2-00061FB4891BE0DF-0-0] [lt=3][errcode=-4389] cb cost too much time(lsn={lsn:6786307}, scn={val:1723711750851302499, v:0}, idx=3, total_cost_time=8665437, append_cost_time=9, cb_wait_thread_time=8665210, cb_wait_commit_time=0, cb_cost_time=218, append_start_time=1723711750852171, append_finish_time=1723711750852180, cb_first_handle_time=1723711759517390, cb_finish_time=1723711759517608)