合并严重超时

【 使用环境 】生产环境
【 OB or 其他组件 】
【 使用版本 】
【问题描述】自动合并的时候严重超时(24小时以上)
【复现路径】
【附件及日志】

SELECT * FROM GV$OB_COMPACTION_DIAGNOSE_INFO之后出现:

集群版本是什么呢,麻烦使用obdiag工具的check功能确认下问题,确认是否存在实际未合并/合并超时的问题,有已知问题是关于 RS_UNCOMPACTED/NOT_SCHEDULE 这类的诊断信息,之前内部的判断时间的逻辑是错误的

debian怎么安装obdiag啊

不一定在安装的节点上,只要可以连接上部署observer的节点即可(安装在docker容器内也行,只需要能连上对应的observer的node节点即可)

±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| fail-tasks-report |
±----------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| task | task_report |
±----------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| cluster.task_opt_stat_gather_fail | [fail] 1146 (42S02): Table ‘oceanbase.DBA_OB_TASK_OPT_STAT_GATHER_HISTORY’ doesn’t exist |
| system.dependent_software | [fail] Execute Shell command on server 10.10.10.145 failed, command=[getenforce], exception:bash:行1: getenforce:未找到命令 |
| | |
| system.parameter | [fail] get_parameter execute: Execute Shell command on server 10.10.10.145 failed, command=[sysctl -n net.ipv4.tcp_tw_recycle], exception:sysctl: 无法获取/proc/sys/net/ipv4/tcp_tw_recycle 的文件状态(stat): 没有那个文件或目录 |
| | |
±----------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
±---------------------------------------------------------------------------------------------------------+
| critical-tasks-report |
±---------------------------±----------------------------------------------------------------------------+
| task | task_report |
±---------------------------±----------------------------------------------------------------------------+
| cluster.data_path_settings | [critical] ip:10.10.10.145 ,data_dir and log_dir_disk are on the same disk. |
±---------------------------±----------------------------------------------------------------------------+
±-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| warning-tasks-report |
±--------------------------±-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| task | task_report |
±--------------------------±-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| system.ulimit_parameter | [warning] On ip : 10.10.10.147, ulimit -c is 0 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.147, ulimit -u is 96004 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.146, ulimit -u is 96004 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.145, ulimit -s is 8192 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.147, ulimit -s is 8192 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.146, ulimit -s is 8192 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.145, ulimit -u is 96004 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.145, ulimit -c is 0 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.146, ulimit -c is 0 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| system.aio | [warning] fs.aio-max-nr : 1048576 is a non recommended value, recommended value need >1048576 |
| | [warning] fs.aio-nr : 4096 is a non recommended value, recommended value need aio-max-nr - aio-nr>20000 * observer_num |
| system.dependent_software | [warning] swapon need close. Now , it is NAME TYPE SIZE USED PRIO |
| | /dev/sda5 partition 975M 0B -2. |
±--------------------------±-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| all-tasks-report |
±----------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| task | task_report |
±----------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| cluster.deadlocks | all pass |
| err_code.find_err_4015 | all pass |
| err_code.find_err_4016 | all pass |
| err_code.find_err_4012 | all pass |
| err_code.find_err_4013 | all pass |
| cluster.core_file_find | all pass |
| cluster.data_path_settings | [critical] ip:10.10.10.145 ,data_dir and log_dir_disk are on the same disk. |
| disk.disk_full | all pass |
| cluster.mod_too_large | all pass |
| cluster.task_opt_stat_gather_fail | [fail] 1146 (42S02): Table ‘oceanbase.DBA_OB_TASK_OPT_STAT_GATHER_HISTORY’ doesn’t exist |
| system.ulimit_parameter | [warning] On ip : 10.10.10.147, ulimit -c is 0 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.147, ulimit -u is 96004 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.146, ulimit -u is 96004 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.145, ulimit -s is 8192 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.147, ulimit -s is 8192 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.146, ulimit -s is 8192 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.145, ulimit -u is 96004 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.145, ulimit -c is 0 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| | [warning] On ip : 10.10.10.146, ulimit -c is 0 . It is a non recommended value, and the recommended value is unlimited. Please refer to the official website document for the configuration method |
| err_code.find_err_4103 | all pass |
| cluster.part_trans_action_max | all pass |
| system.aio | [warning] fs.aio-max-nr : 1048576 is a non recommended value, recommended value need >1048576 |
| | [warning] fs.aio-nr : 4096 is a non recommended value, recommended value need aio-max-nr - aio-nr>20000 * observer_num |
| err_code.find_err_4105 | all pass |
| err_code.find_err_4001 | all pass |
| err_code.find_err_4000 | all pass |
| system.dependent_software | [fail] Execute Shell command on server 10.10.10.145 failed, command=[getenforce], exception:bash:行1: getenforce:未找到命令 |
| | |
| | [warning] swapon need close. Now , it is NAME TYPE SIZE USED PRIO |
| | /dev/sda5 partition 975M 0B -2. |
| cluster.tenant_log_size | all pass |
| cluster.global_indexes_too_much | all pass |
| err_code.find_err_4377 | all pass |
| disk.disk_hole | all pass |
| cpu.oversold | all pass |
| cluster.major | all pass |
| system.parameter | [fail] get_parameter execute: Execute Shell command on server 10.10.10.145 failed, command=[sysctl -n net.ipv4.tcp_tw_recycle], exception:sysctl: 无法获取/proc/sys/net/ipv4/tcp_tw_recycle 的文件状态(stat): 没有那个文件或目录 |
| | |
±----------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

check了

回头去看了一下重启zone节点的日志发现是log not sync, stop zone not allowed

你好,我看了下报告,这边发现了好几个问题,盘看起来确实没有满,是通过了监测的,但是ulimit很多参数都有设置有隐患的地方,接下来需要分析下日志,稍等

4.1.0.1_102000042023061314-43bca414d5065272a730c92a645c3e25768c1d05(Jun 13 2023 14:26:23)采用这个 ob做的集群,ocp版本是4.0.3

合并超时的租户查一下 cdb_ob_major_compaction和GV$OB_COMPACTION_DIAGNOSE_INFO吧,把结果截全看一下呢


diagnose_info里weak read ts is not ready的记录有ls_id吧? 查一下__all_virtual_ls_info,指定租户和ls_id看下呢

147机器1号日志流确实落后了很多,到这台机器上grep一下最近的observer日志,搜下generate_weak_read_timestamp_

[2023-12-04 15:27:26.922986] ERROR try_recycle_blocks (palf_env_impl.cpp:692) [1042160][T1001_PalfGC][T1001][Y0-0000000000000000-0-0] [lt=16][errcode=-4264] Log out of disk space(msg=“log disk space is almost full”, ret=-4264, total_size(MB)=4710, used_size(MB)=4616, used_percent(%)=98, warn_size(MB)=3768, warn_percent(%)=80, limit_size(MB)=4616, limit_percent(%)=98, maximum_used_size(MB)=4616, maximum_log_stream=1, oldest_log_stream=1, oldest_scn={val:1701488296617919005})
[2023-12-04 15:27:26.929282] INFO [STORAGE.TRANS] generate_weak_read_timestamp_ (ob_ls_wrs_handler.cpp:175) [1042092][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=9] get wrs ts(ls_id={id:1}, delta=183079552799, timestamp={val:1701491767375548422}, min_tx_service_ts={val:4611686018427387903})
[2023-12-04 15:27:26.929297] INFO [STORAGE.TRANS] print_stat_info (ob_keep_alive_ls_handler.cpp:211) [1042092][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=14] [Keep Alive Stat] LS Keep Alive Info(tenant_id=1, LS_ID={id:1}, Not_Master_Cnt=1, Near_To_GTS_Cnt=0, Other_Error_Cnt=0, Submit_Succ_Cnt=0, last_scn="{val:1701491765514725961}", last_lsn={lsn:215277325950}, last_gts={val:0}, min_start_scn="{val:0}", min_start_status=1)
[2023-12-04 15:27:26.932444] INFO [COORDINATOR] detect_recover (ob_failure_detector.cpp:141) [1042151][T1001_Occam][T1001][Y0-0000000000000000-0-0] [lt=21] doing detect recover operation(events_with_ops=[{event:{type:RESOURCE NOT ENOUGH, module:LOG, info:clog disk full event, level:FATAL}}])
[2023-12-04 15:27:26.933033] WDIAG [PALF] recycle_blocks_ (palf_env_impl.cpp:1016) [1042160][T1001_PalfGC][T1001][Y0-0000000000000000-0-0] [lt=6][errcode=0] there is not any block can be recycled, need verify the baselsn of PalfHandleImpl whether has been advanced(ret=0, this={IPalfEnvImpl:{IPalfEnvImpl:“Dummy”}, self:“10.10.10.147:2882”, log_dir:"/data/obcluster1/clog/tenant_1001", disk_options_wrapper:{disk_opts_for_stopping_writing:{log_disk_size(MB):4710, log_disk_utilization_threshold(%):80, log_disk_utilization_limit_threshold(%):98}, disk_opts_for_recycling_blocks:{log_disk_size(MB):4710, log_disk_utilization_threshold(%):80, log_disk_utilization_limit_threshold(%):98}, status:1}, log_alloc_mgr_:{flying_log_task:0, flying_meta_task:0}})
[2023-12-04 15:27:26.933052] ERROR try_recycle_blocks (palf_env_impl.cpp:692) [1042160][T1001_PalfGC][T1001][Y0-0000000000000000-0-0] [lt=19][errcode=-4264] Log out of disk space(msg=“log disk space is almost full”, ret=-4264, total_size(MB)=4710, used_size(MB)=4616, used_percent(%)=98, warn_size(MB)=3768, warn_percent(%)=80, limit_size(MB)=4616, limit_percent(%)=98, maximum_used_size(MB)=4616, maximum_log_stream=1, oldest_log_stream=1, oldest_scn={val:1701488296617919005})
[2023-12-04 15:27:26.934276] INFO [PALF] handle_next_submit_log_ (log_sliding_window.cpp:1016) [1042241][T1_L0_G1][T1][YB420A0A0A91-00060BA4F7834D0B-0-0] [lt=14] [PALF STAT GROUP LOG INFO](palf_id=1, self=“10.10.10.147:2882”, role=“FOLLOWER”, total_group_log_cnt=7, avg_log_batch_cnt=0, total_group_log_size=2478, avg_group_log_size=354)
[2023-12-04 15:27:26.934542] INFO [PALF] inner_append_log (palf_handle_impl.cpp:1728) [1041991][T1_IOWorker][T1][Y0-0000000000000000-0-0] [lt=17] [PALF STAT INNER APPEND LOG](this={palf_id:1, self:“10.10.10.147:2882”, has_set_deleted:false}, accum_size=2478)
[2023-12-04 15:27:26.937502] INFO [STORAGE.TRANS] generate_weak_read_timestamp_ (ob_ls_wrs_handler.cpp:175) [1042262][T1001_TenantWea][T1001][Y0-0000000000000000-0-0] [lt=7] get wrs ts(ls_id={id:1}, delta=186360227078, timestamp={val:1701488486710290466}, min_tx_service_ts={val:4611686018427387903})

[2023-12-04 15:21:44.435676] WDIAG [CLOG] handle_submit_task_ (ob_log_replay_service.cpp:1131) [1042460][T1002_ReplaySrv][T1002][Y0-0000000000000000-0-0] [lt=10][errcode=0] no log to fetch but committed_end_lsn is not new file header(ret=0, ret=“OB_SUCCESS”, to_submit_lsn={lsn:34114611096}, committed_end_lsn={lsn:34339675050}, replay_status={ls_id_:{id:1003}, is_enabled_:true, is_submit_blocked_:false, role_:2, err_info_:{lsn_:{lsn:18446744073709551615}, scn_:{val:0}, log_type_:0, is_submit_err_:false, err_ts_:0, err_ret_:0}, ref_cnt_:2, post_barrier_lsn_:{lsn:18446744073709551615}, pending_task_count_:0, submit_log_task_:{ObReplayServiceSubmitTask:{type_:1, enqueue_ts_:1701674504435596, err_info_:{has_fatal_error_:false, fail_ts_:0, fail_cost_:30284638, ret_code_:0}}, next_to_submit_lsn_:{lsn:34114611096}, committed_end_lsn_:{lsn:34339675050}, next_to_submit_scn_:{val:1701521156110797219}, base_lsn_:{lsn:33887907840}, base_scn_:{val:1701374423446587372}, iterator_:{iterator_impl:{buf_:0x1465b5605000, next_round_pread_size:2121728, curr_read_pos:0, curr_read_buf_start_pos:0, curr_read_buf_end_pos:122, log_storage_:{IteratorStorage:{start_lsn:{lsn:34114611096}, end_lsn:{lsn:34114611218}, read_buf:{buf_len_:2125824, buf_:0x1465b5605000}, block_size:67104768, log_storage_:0x1465ef3035b0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:0, prev_entry_scn:{val:1701521156110797218}, curr_entry:{LogEntryHeader:{magic:19528, version:1, log_size:34, scn_:{val:1701521156110797218}, data_checksum:2124447694, flag:0}}, init_mode_version:0, accumlate_checksum:3254118622}}}})
[2023-12-04 15:21:44.439738] INFO [COORDINATOR] refresh (ob_leader_coordinator.cpp:145) [1042151][T1001_Occam][T1001][Y0-0000000000000000-0-0] [lt=17] refresh all_ls_election_reference_info success(ret=0, ret=“OB_SUCCESS”, *new_all_ls_election_reference_info=[{1,2,False,{False,},False,False,True}])
[2023-12-04 15:21:44.440231] INFO [CLOG] get_max_applied_scn (ob_log_apply_service.cpp:731) [1042434][T1002_TenantWea][T1002][Y0-0000000000000000-0-0] [lt=3] get_max_applied_scn(scn={val:18446744073709551615}, this={ls_id
:{id:1}, role
:2, proposal_id_:-1, palf_committed_end_lsn_:{lsn:0}, last_check_scn_:{val:18446744073709551615}, max_applied_cb_scn_:{val:18446744073709551615}})
[2023-12-04 15:21:44.440248] INFO [CLOG] get_min_unreplayed_log_info (ob_replay_status.cpp:1029) [1042434][T1002_TenantWea][T1002][Y0-0000000000000000-0-0] [lt=14] get_min_unreplayed_log_info(lsn={lsn:73384895335}, scn={val:1701492922869683930}, this={ls_id_:{id:1}, is_enabled_:true, is_submit_blocked_:false, role_:2, err_info_:{lsn_:{lsn:73384895269}, scn_:{val:1698905002657473466}, log_type_:0, is_submit_err_:true, err_ts_:1701492923082598, err_ret_:-4002}, ref_cnt_:3, post_barrier_lsn_:{lsn:18446744073709551615}, pending_task_count_:0, submit_log_task_:{ObReplayServiceSubmitTask:{type_:1, enqueue_ts_:1701674504439657, err_info_:{has_fatal_error_:false, fail_ts_:1701492923082598, fail_cost_:29834602, ret_code_:-4002}}, next_to_submit_lsn_:{lsn:73384895335}, committed_end_lsn_:{lsn:73929392249}, next_to_submit_scn_:{val:1701492922869683930}, base_lsn_:{lsn:73009987584}, base_scn_:{val:1701424543712035021}, iterator_:{iterator_impl:{buf_:0x1465b7205000, next_round_pread_size:2121728, curr_read_pos:0, curr_read_buf_start_pos:0, curr_read_buf_end_pos:122, log_storage_:{IteratorStorage:{start_lsn:{lsn:73384895269}, end_lsn:{lsn:73384895391}, read_buf:{buf_len_:2125824, buf_:0x1465b7205000}, block_size:67104768, log_storage_:0x1465f03f35b0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:66, prev_entry_scn:{val:1698905002657473466}, curr_entry:{LogEntryHeader:{magic:19528, version:1, log_size:34, scn_:{val:1698905002657473466}, data_checksum:3348837161, flag:1}}, init_mode_version:0, accumlate_checksum:3238509059}}}})
[2023-12-04 15:21:44.440283] INFO [STORAGE.TRANS] generate_weak_read_timestamp_ (ob_ls_wrs_handler.cpp:175) [1042434][T1002_TenantWea][T1002][Y0-0000000000000000-0-0] [lt=34] get wrs ts(ls_id={id:1}, delta=181581570234, timestamp={val:1701492922869683929}, min_tx_service_ts={val:4611686018427387903})
[2023-12-04 15:21:44.440290] INFO [STORAGE.TRANS] print_stat_info (ob_keep_alive_ls_handler.cpp:211) [1042434][T1002_TenantWea][T1002][Y0-0000000000000000-0-0] [lt=4] [Keep Alive Stat] LS Keep Alive Info(tenant_id=1002, LS_ID={id:1}, Not_Master_Cnt=1, Near_To_GTS_Cnt=0, Other_Error_Cnt=0, Submit_Succ_Cnt=0, last_scn="{val:1701492922869683929}", last_lsn={lsn:73384895203}, last_gts={val:0}, min_start_scn="{val:0}", min_start_status=1)
[2023-12-04 15:21:44.440300] INFO [CLOG] get_max_applied_scn (ob_log_apply_service.cpp:731) [1042434][T1002_TenantWea][T1002][Y0-0000000000000000-0-0] [lt=6] get_max_applied_scn(scn={val:18446744073709551615}, this={ls_id_:{id:1001}, role_:2, proposal_id_:-1, palf_committed_end_lsn_:{lsn:0}, last_check_scn_:{val:18446744073709551615}, max_applied_cb_scn_:{val:18446744073709551615}})
[2023-12-04 15:21:44.440306] INFO [CLOG] get_min_unreplayed_log_info (ob_replay_status.cpp:1029) [1042434][T1002_TenantWea][T1002][Y0-0000000000000000-0-0] [lt=4] get_min_unreplayed_log_info(lsn={lsn:60412604737}, scn={val:1701560096988684470}, this={ls_id_:{id:1001}, is_enabled_:true, is_submit_blocked_:false, role_:2, err_info_:{lsn_:{lsn:60412604671}, scn_:{val:1698974409018970461}, log_type_:0, is_submit_err_:true, err_ts_:1701560097210601, err_ret_:-4002}, ref_cnt_:3, post_barrier_lsn_:{lsn:18446744073709551615}, pending_task_count_:0, submit_log_task_:{ObReplayServiceSubmitTask:{type_:1, enqueue_ts_:1701674504439360, err_info_:{has_fatal_error_:false, fail_ts_:1701560097210601, fail_cost_:35435403, ret_code_:-4002}}, next_to_submit_lsn_:{lsn:60412604737}, committed_end_lsn_:{lsn:60932966934}, next_to_submit_scn_:{val:1701560096988684470}, base_lsn_:{lsn:59521929216}, base_scn_:{val:1701374432191328301}, iterator_:{iterator_impl:{buf_:0x1465b6e05000, next_round_pread_size:2121728, curr_read_pos:0, curr_read_buf_start_pos:0, curr_read_buf_end_pos:122, log_storage_:{IteratorStorage:{start_lsn:{lsn:60412604671}, end_lsn:{lsn:60412604793}, read_buf:{buf_len_:2125824, buf_:0x1465b6e05000}, block_size:67104768, log_storage_:0x1465ef3695b0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:66, prev_entry_scn:{val:1698974409018970461}, curr_entry:{LogEntryHeader:{magic:19528, version:1, log_size:34, scn_:{val:1698974409018970461}, data_checksum:3348837161, flag:1}}, init_mode_version:0, accumlate_checksum:1346075989}}}})
[2023-12-04 15:21:44.440334] INFO [STORAGE.TRANS] generate_weak_read_timestamp_ (ob_ls_wrs_handler.cpp:175) [1042434][T1002_TenantWea][T1002][Y0-0000000000000000-0-0] [lt=27] get wrs ts(ls_id={id:1001}, delta=114407451233, timestamp={val:1701560096988684469}, min_tx_service_ts={val:4611686018427387903})
[2023-12-04 15:21:44.440338] INFO [STORAGE.TRANS] print_stat_info (ob_keep_alive_ls_handler.cpp:211) [1042434][T1002_TenantWea][T1002][Y0-0000000000000000-0-0] [lt=3] [Keep Alive Stat] LS Keep Alive Info(tenant_id=1002, LS_ID={id:1001}, Not_Master_Cnt=1, Near_To_GTS_Cnt=0, Other_Error_Cnt=0, Submit_Succ_Cnt=0, last_scn="{val:1701560096988684469}", last_lsn={lsn:60412604605}, last_gts={val:0}, min_start_scn="{val:0}", min_start_status=1)
[2023-12-04 15:21:44.440364] INFO [CLOG] get_max_applied_scn (ob_log_apply_service.cpp:731) [1042434][T1002_TenantWea][T1002][Y0-0000000000000000-0-0] [lt=23] get_max_applied_scn(scn={val:18446744073709551615}, this={ls_id_:{id:1002}, role_:2, proposal_id_:-1, palf_committed_end_lsn_:{lsn:0}, last_check_scn_:{val:18446744073709551615}, max_applied_cb_scn_:{val:18446744073709551615}})
[2023-12-04 15:21:44.440372] INFO [CLOG] get_min_unreplayed_log_info (ob_replay_status.cpp:1029) [1042434][T1002_TenantWea][T1002][Y0-0000000000000000-0-0] [lt=5] get_min_unreplayed_log_info(lsn={lsn:122224518281}, scn={val:1701674504235566964}, this={ls_id_:{id:1002}, is_enabled_:true, is_submit_blocked_:false, role_:2, err_info_:{lsn_:{lsn:18446744073709551615}, scn_:{val:0}, log_type_:0, is_submit_err_:false, err_ts_:0, err_ret_:0}, ref_cnt_:2, post_barrier_lsn_:{lsn:18446744073709551615}, pending_task_count_:0, submit_log_task_:{ObReplayServiceSubmitTask:{type_:1, enqueue_ts_:1701674504340349, err_info_:{has_fatal_error_:false, fail_ts_:0, fail_cost_:33805632, ret_code_:0}}, next_to_submit_lsn_:{lsn:122224518281}, committed_end_lsn_:{lsn:122224518281}, next_to_submit_scn_:{val:1701674504235566964}, base_lsn_:{lsn:120520163328}, base_scn_:{val:1701374564450288084}, iterator_:{iterator_impl:{buf_:0x1465b6205000, next_round_pread_size:2121728, curr_read_pos:122, curr_read_buf_start_pos:0, curr_read_buf_end_pos:122, log_storage_:{IteratorStorage:{start_lsn:{lsn:122224518159}, end_lsn:{lsn:122224518281}, read_buf:{buf_len_:2125824, buf_:0x1465b6205000}, block_size:67104768, log_storage_:0x1465ef3d15b0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:0, prev_entry_scn:{val:1701674504235566963}, curr_entry:{LogEntryHeader:{magic:19528, version:1, log_size:34, scn_:{val:1701674504235566963}, data_checksum:3348837161, flag:0}}, init_mode_version:0, accumlate_checksum:528919068}}}})
[2023-12-04 15:21:44.440393] INFO [STORAGE.TRANS] generate_weak_read_timestamp_ (ob_ls_wrs_handler.cpp:175) [1042434][T1002_TenantWea][T1002][Y0-0000000000000000-0-0] [lt=14] get wrs ts(ls_id={id:1003}, delta=153348329120, timestamp={val:1701521156110797218}, min_tx_service_ts={val:4611686018427387903})
[2023-12-04 15:21:44.440401] INFO [STORAGE.TRANS] print_stat_info (ob_keep_alive_ls_handler.cpp:211) [1042434][T1002_TenantWea][T1002][Y0-0000000000000000-0-0] [lt=8] [Keep Alive Stat] LS Keep Alive Info(tenant_id=1002, LS_ID={id:1003}, Not_Master_Cnt=1, Near_To_GTS_Cnt=0, Other_Error_Cnt=0, Submit_Succ_Cnt=0, last_scn="{val:1701521156110797218}", last_lsn={lsn:34114611030}, last_gts={val:0}, min_start_scn="{val:0}", min_start_status=1)
[2023-12-04 15:21:44.440666] INFO [STORAGE.REDO] notify_flush (ob_storage_log_writer.cpp:552) [1042401][T1002_OB_SLOG][T1002][Y0-0000000000000000-0-0] [lt=9] Successfully flush(log_item={start_cursor:ObLogCursor{file_id=352, log_id=61300720, offset=5846141}, end_cursor:ObLogCursor{file_id=352, log_id=61300721, offset=5846370}, is_inited:true, is_local:false, buf_size:524288, buf:0x14665bf08050, len:2947, log_data_len:229, seq:61300720, flush_finish:false, flush_ret:0})
[2023-12-04 15:21:44.440763] INFO [STORAGE.REDO] notify_flush (ob_storage_log_writer.cpp:552) [1042401][T1002_OB_SLOG][T1002][Y0-0000000000000000-0-0] [lt=19] Successfully flush(log_item={start_cursor:ObLogCursor{file_id=352, log_id=61300721, offset=5846370}, end_cursor:ObLogCursor{file_id=352, log_id=61300722, offset=5846599}, is_inited:true, is_local:false, buf_size:524288, buf:0x146658508050, len:2718, log_data_len:229, seq:61300721, flush_finish:false, flush_ret:0})
[2023-12-04 15:21:44.440846] INFO [STORAGE.REDO] notify_flush (ob_storage_log_writer.cpp:552) [1042401][T1002_OB_SLOG][T1002][Y0-0000000000000000-0-0] [lt=11] Successfully flush(log_item={start_cursor:ObLogCursor{file_id=352, log_id=61300722, offset=5846599}, end_cursor:ObLogCursor{file_id=352, log_id=61300723, offset=5846828}, is_inited:true, is_local:false, buf_size:524288, buf:0x14665b508050, len:2489, log_data_len:229, seq:61300722, flush_finish:false, flush_ret:0})
[2023-12-04 15:21:44.440925] INFO [STORAGE.REDO] notify_flush (ob_storage_log_writer.cpp:552) [1042401][T1002_OB_SLOG][T1002][Y0-0000000000000000-0-0] [lt=10] Successfully flush(log_item={start_cursor:ObLogCursor{file_id=352, log_id=61300723, offset=5846828}, end_cursor:ObLogCursor{file_id=352, log_id=61300724, offset=5847057}, is_inited:true, is_local:false, buf_size:524288, buf:0x14665a004050, len:2260, log_data_len:229, seq:61300723, flush_finish:false, flush_ret:0})
[2023-12-04 15:21:44.440980] ERROR issue_dba_error (ob_log.cpp:1792) [1042463][T1002_ReplaySrv][T1002][Y0-0000000000000000-0-0] [lt=22][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4016, file=“log_iterator_impl.h”, line_no=190, info=“parse LogMetaEntry failed, unexpected error”)

select svr_ip,zone,with_rootserver,status,block_migrate_in_time,start_service_time,stop_time,build_version from oceanbase.__all_server order by zone;


select tenant_id,tenant_name,primary_zone,compatibility_mode from oceanbase.__all_tenant;
image


问题为clog盘爆,导致回放卡住后无法合并,重启后解决,根因是raid设置为了回写,与https://ask.oceanbase.com/t/topic/35603477/10 为同类型的问题

1 个赞