observer资源占用异常，记录大量日志，大量WDIAG日志

王ASC · 2024 年12 月 11 日 14:36

【使用环境】测试环境
【使用版本】OceanBase 4.2.1.10
【问题描述】
[root@b-09-DTxdxt-DB2 log]$ top
top - 14:26:28 up 23 days, 5:19, 2 users, load average: 8.16, 7.68, 7.50
Tasks: 285 total, 1 running, 282 sleeping, 2 stopped, 0 zombie
%Cpu(s): 2.9 us, 1.9 sy, 0.0 ni, 71.7 id, 23.5 wa, 0.0 hi, 0.0 si, 0.1 st
MiB Mem : 18305.9 total, 2987.4 free, 11492.2 used, 3826.3 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1276.5 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

2629535 admin 20 0 11.2g 9.8g 21616 S 64.8 54.9 16757:46 observer
1983 root 20 0 7691824 223036 10424 S 6.0 1.2 1637:17 ocp_monagent
1981 root 20 0 1351440 30296 7964 S 2.3 0.2 110:17.74 ocp_mgragent

[2024-12-11 14:24:36.626298] WDIAG [PALF] sliding_cb (log_sliding_window.cpp:2289) [2630355][T1004_LogIOCb0][T1004][Y0-0000000000000000-0-0] [lt=9][errcode=-4389] log_task life cost too much time(palf_id=1001, self=“25.8.39.196:2882”, log_id=68689391, log_task={header:{begin_lsn:{lsn:14901393805}, end_lsn:{lsn:14901393927}, log_id:68689391, min_scn:{val:1733898275156007791, v:0}, max_scn:{val:1733898275156007791, v:0}, data_len:66, proposal_id:45, prev_lsn:{lsn:18446744073709551615}, prev_proposal_id:9223372036854775807, committed_end_lsn:{lsn:14901393805}, data_checksum:2222604892, accum_checksum:3630643479, is_padding_log:false, is_raw_write:false}, state_map:{val:62}, ref_cnt:0, gen_ts:1733898275244616, freeze_ts:1733898275244618, submit_ts:1733898275244630, flushed_ts:1733898276626138, gen_to_freeze cost time:2, gen_to_submit cost time:14, submit_to_flush cost time:1381508}, fs_cb_begin_ts=1733898276626255, log_life_time=1381639)
[2024-12-11 14:24:36.626461] WDIAG [CLOG] statistics_cb_cost_ (ob_log_apply_service.cpp:933) [2630590][T1004_ApplySrv3][T1004][YB42190827C4-000627B3417EAA5C-0-0] [lt=15][errcode=-4389] cb cost too much time(lsn={lsn:14901393861}, scn={val:1733898275156007791, v:0}, idx=15, total_cost_time=1380701, append_cost_time=51, cb_wait_thread_time=1381791, cb_wait_commit_time=0, cb_cost_time=-1141, append_start_time=1733898275244613, append_finish_time=1733898275244664, cb_first_handle_time=1733898276626455, cb_finish_time=1733898276625314)
[2024-12-11 14:24:36.627212] WDIAG [LIB] ~ObTimeGuard (utility.h:857) [2629749][IO_HEALTH0][T0][Y0-0000000000000000-0-0] [lt=23][errcode=-4389] destruct(*this=time guard ‘LocalDevice’ cost too much time, used=502305, time_dist: LocalDevice_submit=502282)
[2024-12-11 14:24:36.627258] WDIAG [LIB] ~ObTimeGuard (utility.h:857) [2629749][IO_HEALTH0][T0][Y0-0000000000000000-0-0] [lt=9][errcode=-4389] destruct(*this=time guard ‘detect_aio_request’ cost too much time, used=502341, time_dist: prepare_detect_req=7, device_submit_detect=502320)
[2024-12-11 14:24:36.649423] WDIAG [ARCHIVE] do_thread_task_ (ob_archive_sender.cpp:261) [2630300][T1003_ArcSender][T1003][YB42190827C4-000627B31F1D3103-0-0] [lt=28][errcode=-4018] try free send task failed(ret=-4018)
[2024-12-11 14:24:36.651937] WDIAG [STORAGE.TRANS] run1 (ob_standby_timestamp_service.cpp:147) [2630549][T1004_STSWorker][T1004][Y0-0000000000000000-0-0] [lt=23][errcode=-4076] query and update last id fail(ret=-4076, ret=“OB_NEED_WAIT”)
[2024-12-11 14:24:36.653289] WDIAG [SHARE] inner_to_show_str (ob_system_variable.cpp:517) [2630683][T1004_L0_G0][T1004][YB42190827C4-000627B3417EADC0-0-0] [lt=19][errcode=0] sysvar casted obj is not ObVarcharType(ret=0, value={“NULL”:“NULL”}, *res_obj={“NULL”:“NULL”}, get_name()=secure_file_priv)
[2024-12-11 14:24:36.658749] WDIAG [SERVER] disconnect (obmp_packet_sender.cpp:788) [2630683][T1004_L0_G0][T1004][YB42190827C4-000627B3417EADC2-0-0] [lt=24][errcode=0] server close connection(sessid=3221638150, proxy_sessid=0, stack=“0x1388db15 0xaca9cc2 0xac7305d 0xaf98652 0x553ddf7 0x552223c 0xa95df64 0x13b4953f 0x7f086fae2f1b 0x7f086fa1a1c0”)
[2024-12-11 14:24:36.658811] WDIAG [RPC.OBMYSQL] push_close_req (ob_sql_nio.cpp:748) [2629731][sql_nio3][T0][Y0-0000000000000000-0-0] [lt=6][errcode=-4015] close sql sock by user req(*s={this:0x7f07970a6030, session_id:3221638150, trace_id:Y0-0000000000000000-0-0, sql_handling_stage:256, sql_initiative_shutdown:true, fd:296, err:5, last_decode_time:1733898276658679, last_write_time:1733898276658809, pending_write_task:{buf:null, sz:0}, need_epoll_trigger_write:false, consume_size:651, pending_flag:1, may_handling_flag:true, handler_close_flag:false})
[2024-12-11 14:24:36.664860] WDIAG [STORAGE.TRANS] submit_log_ (ob_id_service.cpp:116) [2629705][TsMgr][T1004][Y0-0000000000000000-0-0] [lt=11][errcode=-4023] submit log callback use too mush time(submit_log_ts=1733898276061319, cb={log_ts:{val:1733898276061300507, v:0}, log_buf:"", pos:40, id_srv_type:0, limited_id:1733898292644753731}, service_type=0,latest_log_ts={val:1733898266003302775, v:0}, self=“25.8.39.196:2882”)
[2024-12-11 14:24:36.721268] WDIAG [PALF] inner_append_log (palf_handle_impl.cpp:2216) [2630162][T1003_IOWorker][T1003][Y0-0000000000000000-0-0] [lt=22][errcode=-4389] write logcost too much time(ret=-4389, this={palf_id:1, self:“25.8.39.196:2882”, has_set_deleted:false}, lsn_array=[{lsn:56202622964}], scn_array=[{val:1733898274411836920, v:0}], curr_size=1090, accum_size=2219, time_cost=1440354)
[2024-12-11 14:24:36.721348] WDIAG [PALF] try_update_match_lsn_map_ (log_sliding_window.cpp:4105) [2630161][T1003_LogIOCb0][T1003][Y0-0000000000000000-0-0] [lt=51][errcode=0] [MATCH LSN ADVANCE DELAY]match_lsn advance delay too much time(ret=0, palf_id=1, self=“25.8.39.196:2882”, server=“25.8.39.196:2882”, update_func={old_end_lsn:{lsn:56202622964}, new_end_lsn:{lsn:56202624054}, old_advance_time_us:1733898275280830, new_ack_time_us:1733898276721342, advance delay(us):1440512})
[2024-12-11 14:24:36.721502] WDIAG [PALF] sliding_cb (log_sliding_window.cpp:2289) [2630161][T1003_LogIOCb0][T1003][Y0-0000000000000000-0-0] [lt=11][errcode=-4389] log_task lifecost too much time(palf_id=1, self=“25.8.39.196:2882”, log_id=105836512, log_task={header:{begin_lsn:{lsn:56202622964}, end_lsn:{lsn:56202624054}, log_id:105836512, min_scn:{val:1733898274411836920, v:0}, max_scn:{val:1733898274412742548, v:0}, data_len:1034, proposal_id:83, prev_lsn:{lsn:18446744073709551615}, prev_proposal_id:9223372036854775807, committed_end_lsn:{lsn:56202622964}, data_checksum:657331385, accum_checksum:211014486, is_padding_log:false, is_raw_write:false}, state_map:{val:62}, ref_cnt:0, gen_ts:1733898274411892, freeze_ts:1733898275280837, submit_ts:1733898275280851, flushed_ts:1733898276721336, gen_to_freeze cost time:868945, gen_to_submit cost time:868959, submit_to_flush cost time:1440485}, fs_cb_begin_ts=1733898276721459, log_life_time=2309567)
[2024-12-11 14:24:36.721502] WDIAG [STORAGE.TRANS] test_lock (ob_trans_ctx.cpp:245) [2630258][T1003_ApplySrv0][T1003][Y0-0000000000000000-0-0] [lt=41][errcode=-4389] transactionlog sync use too much time(log_cb={ObTxBaseLogCb:{base_ts:{val:1733898274412742547, v:0}, log_ts:{val:1733898274412742547, v:0}, lsn:{lsn:56202623509}, submit_ts:1733898274412777}, this:0x7f08126e6708, is_inited_:true, trans_id:{txid:99819930}, ls_id:{id:1}, ctx:0x7f08126e3750, tx_data_guard:{tx_data:NULL}, is_callbacked_:false, is_dynamic_:false, mds_range_:{range_array_.count():0, range_array_:[]}, cb_arg_array_:[{log_type:0x1, arg:null}, {log_type:0x40, arg:null}, {log_type:0x100, arg:null}], first_part_scn_:{val:18446744073709551615, v:3}}, log_sync_used_time=2308719, ctx_lock_wait_time=2)
[2024-12-11 14:24:36.721773] WDIAG [CLOG] statistics_cb_cost_ (ob_log_apply_service.cpp:933) [2630258][T1003_ApplySrv0][T1003][Y0-0000000000000000-0-0] [lt=35][errcode=-4389] cbcost too much time(lsn={lsn:56202623509}, scn={val:1733898274412742547, v:0}, idx=19, total_cost_time=2307931, append_cost_time=7, cb_wait_thread_time=2308721, cb_wait_commit_time=0, cb_cost_time=-797, append_start_time=1733898274412768, append_finish_time=1733898274412775, cb_first_handle_time=1733898276721496, cb_finish_time=1733898276720699)
[2024-12-11 14:24:36.721800] WDIAG [STORAGE.TRANS] test_lock (ob_trans_ctx.cpp:245) [2630258][T1003_ApplySrv0][T1003][Y0-0000000000000000-0-0] [lt=14][errcode=-4389] transactionlog sync use too much time(log_cb={ObTxBaseLogCb:{base_ts:{val:1733898274411836920, v:0}, log_ts:{val:1733898274411836920, v:0}, lsn:{lsn:56202623020}, submit_ts:1733898274411900}, this:0x7f0812973908, is_inited_:true, trans_id:{txid:99819929}, ls_id:{id:1}, ctx:0x7f0812970950, tx_data_guard:{tx_data:NULL}, is_callbacked_:false, is_dynamic_:false, mds_range_:{range_array_.count():0, range_array_:[]}, cb_arg_array_:[{log_type:0x1, arg:null}, {log_type:0x40, arg:null}, {log_type:0x100, arg:null}], first_part_scn_:{val:18446744073709551615, v:3}}, log_sync_used_time=2309900, ctx_lock_wait_time=0)
[2024-12-11 14:24:36.722556] WDIAG [LIB] ~ObTimeGuard (utility.h:857) [2630291][T1003_TenantWea][T1003][Y0-0000000000000000-0-0] [lt=17][errcode=-4389] destruct(*this=time guard’[WRS] [TENANT_WEAK_READ_SERVICE] tenant_id=1003, thread_timer_task’ cost too much time, used=2311849, time_dist: do_cluster_heartbeat=15, generate_cluster_version=2311267)
[2024-12-11 14:24:36.723190] WDIAG [LIB] ~ObTimeGuard (utility.h:857) [2630551][T1004_TenantWea][T1004][Y0-0000000000000000-0-0] [lt=16][errcode=-4389] destruct(*this=time guard’[WRS] [TENANT_WEAK_READ_SERVICE] tenant_id=1004, thread_timer_task’ cost too much time, used=2311587, time_dist: do_cluster_heartbeat=13, generate_cluster_version=2310296)
[2024-12-11 14:24:36.752010] WDIAG [STORAGE.TRANS] run1 (ob_standby_timestamp_service.cpp:147) [2630549][T1004_STSWorker][T1004][Y0-0000000000000000-0-0] [lt=17][errcode=-4076] query and update last id fail(ret=-4076, ret=“OB_NEED_WAIT”)
[2024-12-11 14:24:36.765507] WDIAG [STORAGE.TRANS] submit_log_ (ob_id_service.cpp:116) [2629705][TsMgr][T1004][Y0-0000000000000000-0-0] [lt=11][errcode=-4023] submit log callback use too mush time(submit_log_ts=1733898276061319, cb={log_ts:{val:1733898276061300507, v:0}, log_buf:"", pos:40, id_srv_type:0, limited_id:1733898292644753731}, service_type=0,latest_log_ts={val:1733898266003302775, v:0}, self=“25.8.39.196:2882”)
[2024-12-11 14:24:36.847692] WDIAG alloc_block (ob_block_alloc_mgr.h:40) [2630173][T1003_COMMONLSS][T1003][YB42190827C4-000627B3525EC577-0-0] [lt=19][errcode=-4016] block alloc over limit, limit=38654705 alloc_size=2079744
[2024-12-11 14:24:36.847716] WDIAG [SERVER] record_request (ob_mysql_request_manager.cpp:177) [2630173][T1003_COMMONLSS][T1003][YB42190827C4-000627B3525EC577-0-0] [lt=23][errcode=0] record concurrent fifoallocator alloc mem failed(total_size=1566, tenant_id_=1003, mem_limit_=128849018, request_id_=0, ret=0)
[2024-12-11 14:24:36.852109] WDIAG [STORAGE.TRANS] run1 (ob_standby_timestamp_service.cpp:147) [2630549][T1004_STSWorker][T1004][Y0-0000000000000000-0-0] [lt=26][errcode=-4076] query and update last id fail(ret=-4076, ret=“OB_NEED_WAIT”)
[2024-12-11 14:24:36.866045] WDIAG [STORAGE.TRANS] submit_log_ (ob_id_service.cpp:116) [2629705][TsMgr][T1004][Y0-0000000000000000-0-0] [lt=11][errcode=-4023] submit log callback use too mush time(submit_log_ts=1733898276061319, cb={log_ts:{val:1733898276061300507, v:0}, log_buf:"", pos:40, id_srv_type:0, limited_id:1733898292644753731}, service_type=0,latest_log_ts={val:1733898266003302775, v:0}, self=“25.8.39.196:2882”)
[2024-12-11 14:24:36.952162] WDIAG [STORAGE.TRANS] run1 (ob_standby_timestamp_service.cpp:147) [2630549][T1004_STSWorker][T1004][Y0-0000000000000000-0-0] [lt=8][errcode=-4076] query and update last id fail(ret=-4076, ret=“OB_NEED_WAIT”)
[2024-12-11 14:24:36.966674] WDIAG [STORAGE.TRANS] submit_log_ (ob_id_service.cpp:116) [2629705][TsMgr][T1004][Y0-0000000000000000-0-0] [lt=10][errcode=-4023] submit log callback use too mush time(submit_log_ts=1733898276061319, cb={log_ts:{val:1733898276061300507, v:0}, log_buf:"", pos:40, id_srv_type:0, limited_id:1733898292644753731}, service_type=0,latest_log_ts={val:1733898266003302775, v:0}, self=“25.8.39.196:2882”)
[2024-12-11 14:24:37.008598] WDIAG [SERVER] record_request (ob_mysql_request_manager.cpp:177) [2630242][T1003_FreInfoRe][T1003][YB42190827C4-000627B3532E3D74-0-0] [lt=15][errcode=0] record concurrent fifoallocator alloc mem failed(total_size=1543, tenant_id_=1003, mem_limit_=128849018, request_id_=0, ret=0)
[2024-12-11 14:24:37.011003] WDIAG [STORAGE] try_update_info (ob_tenant_freeze_info_mgr.cpp:985) [2630242][T1003_FreInfoRe][T1003][Y0-0000000000000000-0-0] [lt=21][errcode=0] snapshot_gc_ts not refresh too long(snapshot_gc_ts=1733898263990133928, freeze_info=[], snapshots=[], last_change_ts_=1733898264997858, last_not_change_interval_us=12013144)
[2024-12-11 14:24:37.052238] WDIAG [STORAGE.TRANS] run1 (ob_standby_timestamp_service.cpp:147) [2630549][T1004_STSWorker][T1004][Y0-0000000000000000-0-0] [lt=18][errcode=-4076] query and update last id fail(ret=-4076, ret=“OB_NEED_WAIT”)
[2024-12-11 14:24:37.067373] WDIAG [STORAGE.TRANS] submit_log_ (ob_id_service.cpp:116) [2629705][TsMgr][T1004][Y0-0000000000000000-0-0] [lt=11][errcode=-4023] submit log callback use too mush time(submit_log_ts=1733898276061319, cb={log_ts:{val:1733898276061300507, v:0}, log_buf:"", pos:40, id_srv_type:0, limited_id:1733898292644753731}, service_type=0,latest_log_ts={val:1733898266003302775, v:0}, self=“25.8.39.196:2882”)
[2024-12-11 14:24:37.146042] WDIAG [STORAGE.TRANS] inner_lock_for_read (ob_tx_data_functor.cpp:290) [2630762][TimezoneMgr][T1003][YB42190827C4-000627B34C1FD329-0-0] [lt=61][errcode=-6004] lock_for_read need retry(ret=-6004, tx_data={tx_id:{txid:99819931}, ref_cnt:1, state:“RUNNING”, commit_version:{val:1733898276144251421, v:0}, start_scn:{val:1733898276144251421, v:0}, end_scn:{val:18446744073709551615, v:3}, undo_status_list:{head:null, undo_node_cnt:0{}}}, lock_for_read_arg_={mvcc_acc_ctx_:{type:1, abs_lock_timeout_ts:1733898306045730, tx_lock_timeout_us:-1, snapshot:{version:{val:1733898276144251421, v:0}, tx_id:{txid:0}, scn:0}, tx_table_guards:{tx_table_guard:{tx_table:0x7f08051ae590, epoch:0, mini_cache_:}, src_tx_table_guard:{tx_table:null, epoch:-1, mini_cache_:}, transfer_start_scn:{val:18446744073709551615, v:3}}, tx_id:{txid:0}, tx_desc:NULL, tx_ctx:null, mem_ctx:null, tx_scn:0, write_flag:{is_table_api:0, is_table_lock:0, is_mds:0, is_dml_batch_opt:0, is_insert_up:0, is_write_only_index:0, is_check_row_locked:0}, handle_start_time:-1, lock_wait_start_ts:0}, data_trans_id_:{txid:99819931}, data_sql_sequence_:1733898275984271, read_latest_:false, scn_:{val:4611686018427387903, v:0}}, tx_cc_ctx={state:10, prepare_version:{val:1733898276144251421, v:0}})
[2024-12-11 14:24:37.152320] WDIAG [STORAGE.TRANS] run1 (ob_standby_timestamp_service.cpp:147) [2630549][T1004_STSWorker][T1004][Y0-0000000000000000-0-0] [lt=23][errcode=-4076] query and update last id fail(ret=-4076, ret=“OB_NEED_WAIT”)
[2024-12-11 14:24:37.167946] WDIAG [STORAGE.TRANS] submit_log_ (ob_id_service.cpp:116) [2629705][TsMgr][T1004][Y0-0000000000000000-0-0] [lt=12][errcode=-4023] submit log callback use too mush time(submit_log_ts=1733898276061319, cb={log_ts:{val:1733898276061300507, v:0}, log_buf:"", pos:40, id_srv_type:0, limited_id:1733898292644753731}, service_type=0,latest_log_ts={val:1733898266003302775, v:0}, self=“25.8.39.196:2882”)
[2024-12-11 14:24:37.209374] WDIAG [PALF] inner_append_log (palf_handle_impl.cpp:2216) [2629869][T1_IOWorker][T1][Y0-0000000000000000-0-0] [lt=10][errcode=-4389] write log cost too much time(ret=-4389, this={palf_id:1, self:“25.8.39.196:2882”, has_set_deleted:false}, lsn_array=[{lsn:30140698469}], scn_array=[{val:1733898274604689743, v:0}], curr_size=541, accum_size=1529, time_cost=1733837)
[2024-12-11 14:24:37.209453] WDIAG [PALF] try_update_match_lsn_map_ (log_sliding_window.cpp:4105) [2629868][T1_LogIOCb0][T1][Y0-0000000000000000-0-0] [lt=66][errcode=0] [MATCH LSN ADVANCE DELAY]match_lsn advance delay too much time(ret=0, palf_id=1, self=“25.8.39.196:2882”, server=“25.8.39.196:2882”, update_func={old_end_lsn:{lsn:30140698469}, new_end_lsn:{lsn:30140699010}, old_advance_time_us:1733898275475133, new_ack_time_us:1733898277209449, advance delay(us):1734316})
[2024-12-11 14:24:37.209628] WDIAG [PALF] sliding_cb (log_sliding_window.cpp:2289) [2629868][T1_LogIOCb0][T1][Y0-0000000000000000-0-0] [lt=6][errcode=-4389] log_task life cost too much time(palf_id=1, self=“25.8.39.196:2882”, log_id=76011659, log_task={header:{begin_lsn:{lsn:30140698469}, end_lsn:{lsn:30140699010}, log_id:76011659, min_scn:{val:1733898274604689743, v:0}, max_scn:{val:1733898274604689743, v:0}, data_len:485, proposal_id:33, prev_lsn:{lsn:18446744073709551615}, prev_proposal_id:9223372036854775807, committed_end_lsn:{lsn:30140698469}, data_checksum:3163184766, accum_checksum:4000294442, is_padding_log:false, is_raw_write:false}, state_map:{val:62}, ref_cnt:0, gen_ts:1733898274604717, freeze_ts:1733898275475306, submit_ts:1733898275475317, flushed_ts:1733898277209445, gen_to_freeze cost time:870589, gen_to_submit cost time:870600, submit_to_flush cost time:1734128}, fs_cb_begin_ts=1733898277209597, log_life_time=2604880)
[2024-12-11 14:24:37.209662] WDIAG [STORAGE.TRANS] test_lock (ob_trans_ctx.cpp:245) [2629965][T1_ApplySrv0][T1][Y0-0000000000000000-0-0] [lt=35][errcode=-4389] transaction log sync use too much time(log_cb={ObTxBaseLogCb:{base_ts:{val:1733898274604689743, v:0}, log_ts:{val:1733898274604689743, v:0}, lsn:{lsn:30140698525}, submit_ts:1733898274604751}, this:0x7f0833094208, is_inited_:true, trans_id:{txid:55263565}, ls_id:{id:1}, ctx:0x7f0833091250, tx_data_guard:{tx_data:NULL}, is_callbacked_:false, is_dynamic_:false, mds_range_:{range_array_.count():0, range_array_:[]}, cb_arg_array_:[{log_type:0x1, arg:null}, {log_type:0x40, arg:null}, {log_type:0x100, arg:null}], first_part_scn_:{val:18446744073709551615, v:3}}, log_sync_used_time=2604901, ctx_lock_wait_time=4)
[2024-12-11 14:24:37.210040] WDIAG [CLOG] statistics_cb_cost_ (ob_log_apply_service.cpp:933) [2629965][T1_ApplySrv0][T1][Y0-0000000000000000-0-0] [lt=8][errcode=-4389] cb cost too much time(lsn={lsn:30140698525}, scn={val:1733898274604689743, v:0}, idx=15, total_cost_time=2605164, append_cost_time=30, cb_wait_thread_time=2604909, cb_wait_commit_time=0, cb_cost_time=225, append_start_time=1733898274604712, append_finish_time=1733898274604742, cb_first_handle_time=1733898277209651, cb_finish_time=1733898277209876)
[2024-12-11 14:24:37.210164] WDIAG [LIB] ~ObTimeGuard (utility.h:857) [2630046][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=29][errcode=-4389] destruct(*this=time guard ‘[WRS] [TENANT_WEAK_READ_SERVICE] tenant_id=1, thread_timer_task’ cost too much time, used=2606377, time_dist: do_cluster_heartbeat=12, generate_cluster_version=2606268)
[2024-12-11 14:24:37.246763] WDIAG [ARCHIVE] do_thread_task_ (ob_archive_sender.cpp:261) [2630055][T1_ArcSender][T1][YB42190827C4-000627B31DFD3103-0-0] [lt=43][errcode=-4018] try free send task failed(ret=-4018)
[2024-12-11 14:24:37.252388] WDIAG [STORAGE.TRANS] run1 (ob_standby_timestamp_service.cpp:147) [2630549][T1004_STSWorker][T1004][Y0-0000000000000000-0-0] [lt=20][errcode=-4076] query and update last id fail(ret=-4076, ret=“OB_NEED_WAIT”)
[2024-12-11 14:24:37.268632] WDIAG [STORAGE.TRANS] submit_log_ (ob_id_service.cpp:116) [2629705][TsMgr][T1004][Y0-0000000000000000-0-0] [lt=22][errcode=-4023] submit log callback use too mush time(submit_log_ts=1733898276061319, cb={log_ts:{val:1733898276061300507, v:0}, log_buf:"", pos:40, id_srv_type:0, limited_id:1733898292644753731}, service_type=0,latest_log_ts={val:1733898266003302775, v:0}, self=“25.8.39.196:2882”)

淇铭 · 2024 年12 月 11 日 14:38

你把observer.log的日志压缩发一下日志全点好分析日志

淇铭 · 2024 年12 月 11 日 14:42

[errcode=-4023] submit log callback use too mush time

淇铭 · 2024 年12 月 11 日 14:47

按照下面的文档排查一下按照我楼上说的先把完整的observer.log日志发一下后期好排查

OB_EAGAIN (-4023) 错误排查指南

https://www.oceanbase.com/knowledge-base/oceanbase-database-1000000001013870?back=kb

王ASC · 2024 年12 月 11 日 15:03

observer.zip (9.9 MB)

靖顺 · 2024 年12 月 11 日 15:04

没有排查思路的时候，推荐的排查方式：

用诊断工具obdiag 巡检一下：https://www.oceanbase.com/docs/common-obdiag-cn-1000000001768218
用obdiag analyze log做一下日志分析： https://www.oceanbase.com/docs/common-obdiag-cn-1000000001768214

淇铭 · 2024 年12 月 11 日 15:17

好的日志先分析分析自己也可以按照文档排查一下

淇铭 · 2024 年12 月 19 日 10:37

麻烦查一下这几个参数
show parameters where name in (‘memory_limit’,‘memory_limit_percentage’,‘system_memory’,‘log_disk_size’,‘log_disk_percentage’,‘datafile_size’,‘datafile_disk_percentage’);

王ASC · 2024 年12 月 19 日 10:47

ZONE	SVR_TYPE	SVR_IP	SVR_PORT	NAME	DATA_TYPE	VALUE	INFO	SECTION	SCOPE	SOURCE	EDIT_LEVEL
zone1	observer	25.8.39.196	2882	log_disk_percentage	null	0	the percentage of disk space used by the log files. Range: [0,99] in integer;only effective when parameter log_disk_size is 0;when log_disk_percentage is 0: a) if the data and the log are on the same disk, means log_disk_percentage = 30 b) if the data and the log are on the different disks, means log_disk_perecentage = 90	LOGSERVICE	CLUSTER	DEFAULT	DYNAMIC_EFFECTIVE
zone1	observer	25.8.39.196	2882	log_disk_size	null	0M	the size of disk space used by the log files. Range: [0, +∞)	LOGSERVICE	CLUSTER	DEFAULT	DYNAMIC_EFFECTIVE
zone1	observer	25.8.39.196	2882	memory_limit_percentage	null	80	the size of the memory reserved for internal use(for testing purpose). Range: [10, 95]	OBSERVER	CLUSTER	DEFAULT	DYNAMIC_EFFECTIVE
zone1	observer	25.8.39.196	2882	system_memory	null	0M	the memory reserved for internal use which cannot be allocated to any outer-tenant, and should be determined to guarantee every server functions normally. Range: [0M,)	OBSERVER	CLUSTER	DEFAULT	DYNAMIC_EFFECTIVE
zone1	observer	25.8.39.196	2882	memory_limit	null	0M	the size of the memory reserved for internal use(for testing purpose), 0 means follow memory_limit_percentage. Range: 0, [1G,).	OBSERVER	CLUSTER	DEFAULT	DYNAMIC_EFFECTIVE
zone1	observer	25.8.39.196	2882	datafile_disk_percentage	null	0	the percentage of disk space used by the data files. Range: [0,99] in integer	SSTABLE	CLUSTER	DEFAULT	DYNAMIC_EFFECTIVE
zone1	observer	25.8.39.196	2882	datafile_size	null	0M	size of the data file. Range: [0, +∞)	SSTABLE	CLUSTER	DEFAULT	DYNAMIC_EFFECTIVE