uos虚拟机重启之后,单节点测试ob重启失败

【 使用环境 】测试环境
【 OB or 其他组件 】clog
【 使用版本 】v4.3
【问题描述】obd restart命令重启失败,查看observer.log,有以下日志
【复现路径】看起来是clog文件出问题,有无效的数据,但不知道怎么解决
【附件及日志】
[2024-09-23 00:20:34.932252] WDIAG [PALF] check_integrity (log_entry_header.cpp:186) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=25][errcode=-4016] data checksum mismatch(data_checksum=2969070358, tmp_data_checksum=1450560999, data_len=452, valid_data_len=452, this={magic:19528, version:1, log_size:452, scn_:{val:1726881941807037552, v:0}, data_checksum:2969070358, flag:0})

[2024-09-23 00:20:34.932289] WDIAG [PALF] check_integrity (log_group_entry_header.cpp:295) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=36][errcode=-4000] check data checksum failed(*buf=L, buf_len=484, *this={magic:18258, version:1, group_size:484, proposal_id:508, committed_lsn:{lsn:26229468783}, max_scn:{val:1726881941807037552, v:0}, accumulated_checksum:4192272670, log_id:71634296, flag:0})

[2024-09-23 00:20:34.932297] WDIAG [PALF] verify_accum_checksum_ (log_iterator_impl.h:774) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=8][errcode=-4070] invalid data(ret=-4070, this={buf_:0x7f1a61805000, next_round_pread_size:2121728, curr_read_pos:1357503, curr_read_buf_start_pos:0, curr_read_buf_end_pos:2121728, log_storage_:{IteratorStorage:{start_lsn:{lsn:26228111280}, end_lsn:{lsn:26230233008}, read_buf:{buf_len_:2125824, buf_:0x7f1a61805000}, block_size:67104768, log_storage_:0x7f1a63de3bf0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:0, prev_entry_scn:{val:1726881941708536379, v:0}, curr_entry:{LogGroupEntryHeader:{magic:18258, version:1, group_size:484, proposal_id:508, committed_lsn:{lsn:26229468783}, max_scn:{val:1726881941807037552, v:0}, accumulated_checksum:4192272670, log_id:71634296, flag:0}}, init_mode_version:0, accumulate_checksum:3744452823, curr_entry_is_padding:0, padding_entry_size:66, padding_entry_scn:{val:1726881941708536379, v:0}}, entry={LogGroupEntryHeader:{magic:18258, version:1, group_size:484, proposal_id:508, committed_lsn:{lsn:26229468783}, max_scn:{val:1726881941807037552, v:0}, accumulated_checksum:4192272670, log_id:71634296, flag:0}})

[2024-09-23 00:20:34.932310] WDIAG [PALF] handle_each_log_group_entry_ (log_iterator_impl.h:314) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=13][errcode=-4070] verify_accum_checksum_ failed(ret=-4070, this={buf_:0x7f1a61805000, next_round_pread_size:2121728, curr_read_pos:1357503, curr_read_buf_start_pos:0, curr_read_buf_end_pos:2121728, log_storage_:{IteratorStorage:{start_lsn:{lsn:26228111280}, end_lsn:{lsn:26230233008}, read_buf:{buf_len_:2125824, buf_:0x7f1a61805000}, block_size:67104768, log_storage_:0x7f1a63de3bf0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:0, prev_entry_scn:{val:1726881941708536379, v:0}, curr_entry:{LogGroupEntryHeader:{magic:18258, version:1, group_size:484, proposal_id:508, committed_lsn:{lsn:26229468783}, max_scn:{val:1726881941807037552, v:0}, accumulated_checksum:4192272670, log_id:71634296, flag:0}}, init_mode_version:0, accumulate_checksum:3744452823, curr_entry_is_padding:0, padding_entry_size:66, padding_entry_scn:{val:1726881941708536379, v:0}}, entry={LogGroupEntryHeader:{magic:18258, version:1, group_size:484, proposal_id:508, committed_lsn:{lsn:26229468783}, max_scn:{val:1726881941807037552, v:0}, accumulated_checksum:4192272670, log_id:71634296, flag:0}})

[2024-09-23 00:20:34.932316] WDIAG [PALF] parse_log_group_entry_ (log_iterator_impl.h:253) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=6][errcode=-4070] handle_each_log_group_entry_ failed(this={buf_:0x7f1a61805000, next_round_pread_size:2121728, curr_read_pos:1357503, curr_read_buf_start_pos:0, curr_read_buf_end_pos:2121728, log_storage_:{IteratorStorage:{start_lsn:{lsn:26228111280}, end_lsn:{lsn:26230233008}, read_buf:{buf_len_:2125824, buf_:0x7f1a61805000}, block_size:67104768, log_storage_:0x7f1a63de3bf0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:0, prev_entry_scn:{val:1726881941708536379, v:0}, curr_entry:{LogGroupEntryHeader:{magic:18258, version:1, group_size:484, proposal_id:508, committed_lsn:{lsn:26229468783}, max_scn:{val:1726881941807037552, v:0}, accumulated_checksum:4192272670, log_id:71634296, flag:0}}, init_mode_version:0, accumulate_checksum:3744452823, curr_entry_is_padding:0, padding_entry_size:66, padding_entry_scn:{val:1726881941708536379, v:0}}, info={reason:4, log_scn:{val:18446744073709551615, v:3}}, replayable_point_scn={val:4611686018427387903, v:0})

[2024-09-23 00:20:34.932323] WDIAG [PALF] next (log_iterator_impl.h:662) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=7][errcode=-4070] read invalid data, need clean cache, maybe storage device cann’t guarantee linear consistency reading(ret=-4070, this={buf_:0x7f1a61805000, next_round_pread_size:2121728, curr_read_pos:1357503, curr_read_buf_start_pos:0, curr_read_buf_end_pos:2121728, log_storage_:{IteratorStorage:{start_lsn:{lsn:26228111280}, end_lsn:{lsn:26230233008}, read_buf:{buf_len_:2125824, buf_:0x7f1a61805000}, block_size:67104768, log_storage_:0x7f1a63de3bf0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:0, prev_entry_scn:{val:1726881941708536379, v:0}, curr_entry:{LogGroupEntryHeader:{magic:18258, version:1, group_size:484, proposal_id:508, committed_lsn:{lsn:26229468783}, max_scn:{val:1726881941807037552, v:0}, accumulated_checksum:4192272670, log_id:71634296, flag:0}}, init_mode_version:0, accumulate_checksum:3744452823, curr_entry_is_padding:0, padding_entry_size:66, padding_entry_scn:{val:1726881941708536379, v:0}})

[2024-09-23 00:20:34.932329] WDIAG [PALF] next (log_iterator_impl.h:669) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=6][errcode=-4070] get_next_entry_ failed(ret=-4070, this={buf_:0x7f1a61805000, next_round_pread_size:2121728, curr_read_pos:0, curr_read_buf_start_pos:0, curr_read_buf_end_pos:0, log_storage_:{IteratorStorage:{start_lsn:{lsn:26229468783}, end_lsn:{lsn:26229468783}, read_buf:{buf_len_:2125824, buf_:0x7f1a61805000}, block_size:67104768, log_storage_:0x7f1a63de3bf0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:0, prev_entry_scn:{val:1726881941708536379, v:0}, curr_entry:{LogGroupEntryHeader:{magic:18258, version:1, group_size:484, proposal_id:508, committed_lsn:{lsn:26229468783}, max_scn:{val:1726881941807037552, v:0}, accumulated_checksum:4192272670, log_id:71634296, flag:0}}, init_mode_version:0, accumulate_checksum:3744452823, curr_entry_is_padding:0, padding_entry_size:66, padding_entry_scn:{val:1726881941708536379, v:0}})

[2024-09-23 00:20:34.932334] WDIAG [PALF] next (palf_iterator.h:160) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=5][errcode=-4070] PalfIterator next failed(ret=-4070, this={iterator_impl:{buf_:0x7f1a61805000, next_round_pread_size:2121728, curr_read_pos:0, curr_read_buf_start_pos:0, curr_read_buf_end_pos:0, log_storage_:{IteratorStorage:{start_lsn:{lsn:26229468783}, end_lsn:{lsn:26229468783}, read_buf:{buf_len_:2125824, buf_:0x7f1a61805000}, block_size:67104768, log_storage_:0x7f1a63de3bf0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:0, prev_entry_scn:{val:1726881941708536379, v:0}, curr_entry:{LogGroupEntryHeader:{magic:18258, version:1, group_size:484, proposal_id:508, committed_lsn:{lsn:26229468783}, max_scn:{val:1726881941807037552, v:0}, accumulated_checksum:4192272670, log_id:71634296, flag:0}}, init_mode_version:0, accumulate_checksum:3744452823, curr_entry_is_padding:0, padding_entry_size:66, padding_entry_scn:{val:1726881941708536379, v:0}}})

[2024-09-23 00:20:34.932553] ERROR issue_dba_error (ob_log.cpp:1875) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=5][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4070, file=“log_iterator_impl.h”, line_no=918, info=“the block has been corrupted!!!”)

[2024-09-23 00:20:34.932570] EDIAG [PALF] check_is_the_last_entry (log_iterator_impl.h:918) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=17][errcode=-4070] the block has been corrupted!!!(ret=-4070, this={buf_:0x7f1a61805000, next_round_pread_size:2121728, curr_read_pos:450, curr_read_buf_start_pos:0, curr_read_buf_end_pos:2121728, log_storage_:{IteratorStorage:{start_lsn:{lsn:26229468784}, end_lsn:{lsn:26231590512}, read_buf:{buf_len_:2125824, buf_:0x7f1a61805000}, block_size:67104768, log_storage_:0x7f1a63de3bf0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:0, prev_entry_scn:{val:1726881941708536379, v:0}, curr_entry:{LogGroupEntryHeader:{magic:18258, version:1, group_size:484, proposal_id:508, committed_lsn:{lsn:26229468783}, max_scn:{val:1726881941807037552, v:0}, accumulated_checksum:4192272670, log_id:71634296, flag:0}}, init_mode_version:0, accumulate_checksum:3744452823, curr_entry_is_padding:0, padding_entry_size:66, padding_entry_scn:{val:1726881941708536379, v:0}}, header={magic:18258, version:1, group_size:488, proposal_id:489, committed_lsn:{lsn:52064804914}, max_scn:{val:1726679641344216186, v:0}, accumulated_checksum:2416313408, log_id:99509942, flag:0}) BACKTRACE:0x11c0c4ed 0x7146b12 0x71465d4 0x7146264 0x713357a 0x72990b2 0x72127a4 0x7358ed8 0x71c544b 0x7344a36 0x750732c 0x1061f949 0x9a27fe5 0xdfa0ee3 0xa3a60fc 0x71166fb 0x7f1a94472687 0x5143dcc

[2024-09-23 00:20:34.932650] ERROR issue_dba_error (ob_log.cpp:1875) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=79][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4070, file=“log_storage.h”, line_no=341, info=“locate_log_tail_and_last_valid_entry_header_ failed”)

[2024-09-23 00:20:34.932653] EDIAG [PALF] locate_log_tail_and_last_valid_entry_header_ (log_storage.h:341) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=3][errcode=-4070] locate_log_tail_and_last_valid_entry_header_ failed(ret=-4070, curr_entry={LogGroupEntryHeader:{magic:18258, version:1, group_size:66, proposal_id:508, committed_lsn:{lsn:26229468661}, max_scn:{val:1726881941708536379, v:0}, accumulated_checksum:3744452823, log_id:71634295, flag:0}}, iterator={iterator_impl:{buf_:0x7f1a61805000, next_round_pread_size:2121728, curr_read_pos:450, curr_read_buf_start_pos:0, curr_read_buf_end_pos:2121728, log_storage_:{IteratorStorage:{start_lsn:{lsn:26229468784}, end_lsn:{lsn:26231590512}, read_buf:{buf_len_:2125824, buf_:0x7f1a61805000}, block_size:67104768, log_storage_:0x7f1a63de3bf0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:0, prev_entry_scn:{val:1726881941708536379, v:0}, curr_entry:{LogGroupEntryHeader:{magic:18258, version:1, group_size:484, proposal_id:508, committed_lsn:{lsn:26229468783}, max_scn:{val:1726881941807037552, v:0}, accumulated_checksum:4192272670, log_id:71634296, flag:0}}, init_mode_version:0, accumulate_checksum:3744452823, curr_entry_is_padding:0, padding_entry_size:66, padding_entry_scn:{val:1726881941708536379, v:0}}}) BACKTRACE:0x11c0c4ed 0x7146b12 0x71465d4 0x7146264 0x713357a 0x729249b 0x721290b 0x7358ed8 0x71c544b 0x7344a36 0x750732c 0x1061f949 0x9a27fe5 0xdfa0ee3 0xa3a60fc 0x71166fb 0x7f1a94472687 0x5143dcc

[2024-09-23 00:20:34.932752] WDIAG [PALF] load (log_storage.h:273) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=25][errcode=-4070] locate_log_tail_and_last_valid_entry_header_ failed(ret=-4070, ret=“OB_INVALID_DATA”, this={log_tail:{lsn:26237964288}, readable_log_tail:{lsn:26237964288}, log_block_header:{magic:18754, version:1, min_lsn:{lsn:18446744073709551615}, min_scn:{val:18446744073709551615, v:3}, curr_block_id:390, palf_id:1}, block_mgr:{log_dir:"/root/xt_test/oceanbase/store/clog/tenant_1/1/log", dir_fd:129, min_block_id:367, max_block_id:391, curr_writable_block_id:274894685184}, logical_block_size_:67104768, curr_block_writable_size_:0, block_header_serialize_buf_:0x7f1a63de4838, flashback_version:0})

[2024-09-23 00:20:34.932761] INFO [PALF] load (log_storage.h:278) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=9] LogStorage load finish(ret=-4070, ret=“OB_INVALID_DATA”, this={log_tail:{lsn:26237964288}, readable_log_tail:{lsn:26237964288}, log_block_header:{magic:18754, version:1, min_lsn:{lsn:18446744073709551615}, min_scn:{val:18446744073709551615, v:3}, curr_block_id:390, palf_id:1}, block_mgr:{log_dir:"/root/xt_test/oceanbase/store/clog/tenant_1/1/log", dir_fd:129, min_block_id:367, max_block_id:391, curr_writable_block_id:274894685184}, logical_block_size_:67104768, curr_block_writable_size_:0, block_header_serialize_buf_:0x7f1a63de4838, flashback_version:0}, min_block_id=367, max_block_id=390)

[2024-09-23 00:20:34.932768] ERROR issue_dba_error (ob_log.cpp:1875) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=6][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4070, file=“log_engine.cpp”, line_no=265, info=“LogStorage load failed”)

[2024-09-23 00:20:34.932770] EDIAG [PALF] load (log_engine.cpp:265) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=2][errcode=-4070] LogStorage load failed(ret=-4070, palf_id=1, base_dir="/root/xt_test/oceanbase/store/clog/tenant_1/1") BACKTRACE:0x11c0c4ed 0x7146b12 0x71465d4 0x7146264 0x713357a 0x721451d 0x720f6ec 0x7358ed8 0x71c544b 0x7344a36 0x750732c 0x1061f949 0x9a27fe5 0xdfa0ee3 0xa3a60fc 0x71166fb 0x7f1a94472687 0x5143dcc

[2024-09-23 00:20:34.932784] WDIAG [LIB] ~ObTimeGuard (utility.h:857) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=13][errcode=-4389] destruct(*this=time guard ‘load’ cost too much time, used=214520, time_dist: load log_meta_storage=178002)

[2024-09-23 00:20:34.932789] WDIAG [PALF] load (palf_handle_impl.cpp:208) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=4][errcode=-4070] LogEngine load failed(ret=-4070, palf_id=1)

[2024-09-23 00:20:34.932792] ERROR issue_dba_error (ob_log.cpp:1875) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=2][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4070, file=“palf_env_impl.cpp”, line_no=1059, info=“PalfHandleImpl init failed”)

[2024-09-23 00:20:34.932794] EDIAG [PALF] reload_palf_handle_impl_ (palf_env_impl.cpp:1059) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=2][errcode=-4070] PalfHandleImpl init failed(ret=-4070, palf_id=1) BACKTRACE:0x11c0c4ed 0x7145832 0x71452f4 0x7144f82 0x712421b 0x735a524 0x7358d5b 0x71c544b 0x7344a36 0x750732c 0x1061f949 0x9a27fe5 0xdfa0ee3 0xa3a60fc 0x71166fb 0x7f1a94472687 0x5143dcc

[2024-09-23 00:20:34.932809] ERROR issue_dba_error (ob_log.cpp:1875) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=15][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4070, file=“palf_env_impl.cpp”, line_no=1074, info=“reload_palf_handle_impl_ failed, need free tmp_palf_handle_impl”)

[2024-09-23 00:20:34.932812] EDIAG [PALF] reload_palf_handle_impl_ (palf_env_impl.cpp:1074) [336669][observer][T1][Y0-0000000000000000-0-0] [lt=3][errcode=-4070] reload_palf_handle_impl_ failed, need free tmp_palf_handle_impl(ret=-4070, tmp_palf_handle_impl=0x7f1a63ddc030) BACKTRACE:0x11c0c4ed 0x7145832 0x71452f4 0x7144f82 0x712421b 0x735a984 0x73586b6 0x71c544b 0x7344a36 0x750732c 0x1061f949 0x9a27fe5 0xdfa0ee3 0xa3a60fc 0x71166fb 0x7f1a94472687 0x5143dcc

推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):

【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

【备注】基于 LLM 和开源文档 RAG 的论坛小助手已开放测试,在发帖时输入 [@论坛小助手] 即可召唤小助手,欢迎试用!

用户问题: “使用 obd 重启 OceanBase 单节点失败,如何解决 clog 文件出现无效数据导致的重启失败问题?”

推荐使用 obdiag 工具进行日志采集,并且将采集到的日志提供给我们以便确认问题所在。

日志采集场景: observer无故重启
日志采集命令: obdiag gather scene run --scene=observer.restart

上述命令将会帮助我们收集与您的问题相关的日志信息,以便于进一步分析为何重启会失败。同时,为了更准确地定位问题,请您提供以下信息:

  1. 在重启失败前后是否有进行过大量的写入操作或执行过大型的DDL语句?
  2. 是否观察到了 clog 空间使用率异常增高或者其他存储相关的问题?
  3. 当前集群的状态如何?是否还有其他节点也遇到了类似的问题?

请您执行上述命令并将生成的日志压缩包发送给我们,以便我们能够更深入地了解问题的具体原因。

详情请参考:

  1. 日志同步问题
  2. 集群管理 FAQ
  3. 600.6000-6999-of-mysql-mode.md
  4. 4500 ~ 4999
  5. SYS 租户/RS 服务问题

附上敏捷诊断工具 obdiag 使用帮助链接

clog损坏,目前是无法修复的。
建议虚拟机集群重启严格执行先停止集群再关闭虚拟机操作。