机器断电重启, observer无法启动

【 使用环境 】 测试环境
【 observer 】
【 使用版本 】4.2.1.5 社区版
【问题描述】机器异常断电无法启动observe
【复现路径】问题出现前后相关操作
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):

【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)
observer.log (1.6 MB)

内存太小时,异常会很多。
参数:memory_limit=6GB,system_memory=2G

如果 memory_limit 只有 6GB,那 system_memory 就设置为 1G 。

system_memory已修改为1G, 仍无法启动, 请再帮忙看下
observer.log (1.6 MB)

从你提供的observer.log文件中,有如下一段日志信息:
[2024-06-29 12:14:36.292156] ERROR issue_dba_error (ob_log.cpp:1868) [9674][observer][T1][Y0-0000000000000000-0-0] [lt=6][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4070, file=“log_iterator_impl.h”, line_no=918, info=“the block has been corrupted!!!”)
[2024-06-29 12:14:36.292171] EDIAG [PALF] check_is_the_last_entry (log_iterator_impl.h:918) [9674][observer][T1][Y0-0000000000000000-0-0] [lt=14][errcode=-4070] the block has been corrupted!!!(ret=-4070, this={buf_:0x2b3d8f205000, next_round_pread_size:2121728, curr_read_pos:516, curr_read_buf_start_pos:0, curr_read_buf_end_pos:2121728, log_storage_:{IteratorStorage:{start_lsn:{lsn:10259660288}, end_lsn:{lsn:10261782016}, read_buf:{buf_len_:2125824, buf_:0x2b3d8f205000}, block_size:67104768, log_storage_:0x2b3d8e9f94f0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:0, prev_entry_scn:{val:1719613717977266387, v:0}, curr_entry:{LogGroupEntryHeader:{magic:18258, version:1, group_size:485, proposal_id:23, committed_lsn:{lsn:10259660287}, max_scn:{val:1719613718177645630, v:0}, accumulated_checksum:2284653261, log_id:27748859, flag:0}}, init_mode_version:0, accumulate_checksum:1650252347, curr_entry_is_padding:0, padding_entry_size:66, padding_entry_scn:{val:1719613717977266387, v:0}}, header={magic:18258, version:1, group_size:443, proposal_id:23, committed_lsn:{lsn:19989852164}, max_scn:{val:1719007320174696953, v:0}, accumulated_checksum:3794494559, log_id:35602855, flag:0}) BACKTRACE:0x113b7e05 0x6ae0337 0x6adff04 0x6adfb92 0x6accd8a 0x6c5688a 0x6ba53f7 0x6cf7e39 0x6b5b88b 0x6ce4268 0x8748ddb 0xfebe349 0x950f2f2 0xd8b8653 0x9ce090b 0x6ab20fb 0x2b3d6498c555 0x4e56560
[2024-06-29 12:14:36.292236] ERROR issue_dba_error (ob_log.cpp:1868) [9674][observer][T1][Y0-0000000000000000-0-0] [lt=65][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4070, file=“log_storage.h”, line_no=336, info=“locate_log_tail_and_last_valid_entry_header_ failed”)
[2024-06-29 12:14:36.292239] EDIAG [PALF] locate_log_tail_and_last_valid_entry_header_ (log_storage.h:336) [9674][observer][T1][Y0-0000000000000000-0-0] [lt=3][errcode=-4070] locate_log_tail_and_last_valid_entry_header_ failed(ret=-4070, curr_entry={LogGroupEntryHeader:{magic:18258, version:1, group_size:66, proposal_id:23, committed_lsn:{lsn:10259660165}, max_scn:{val:1719613717977266387, v:0}, accumulated_checksum:1650252347, log_id:27748858, flag:1}}, iterator={iterator_impl:{buf_:0x2b3d8f205000, next_round_pread_size:2121728, curr_read_pos:516, curr_read_buf_start_pos:0, curr_read_buf_end_pos:2121728, log_storage_:{IteratorStorage:{start_lsn:{lsn:10259660288}, end_lsn:{lsn:10261782016}, read_buf:{buf_len_:2125824, buf_:0x2b3d8f205000}, block_size:67104768, log_storage_:0x2b3d8e9f94f0}, IteratorStorageType::“DiskIteratorStorage”}, curr_entry_is_raw_write:false, curr_entry_size:0, prev_entry_scn:{val:1719613717977266387, v:0}, curr_entry:{LogGroupEntryHeader:{magic:18258, version:1, group_size:485, proposal_id:23, committed_lsn:{lsn:10259660287}, max_scn:{val:1719613718177645630, v:0}, accumulated_checksum:2284653261, log_id:27748859, flag:0}}, init_mode_version:0, accumulate_checksum:1650252347, curr_entry_is_padding:0, padding_entry_size:66, padding_entry_scn:{val:1719613717977266387, v:0}}}) BACKTRACE:0x113b7e05 0x6ae0337 0x6adff04 0x6adfb92 0x6accd8a 0x6c4fb17 0x6ba5565 0x6cf7e39 0x6b5b88b 0x6ce4268 0x8748ddb 0xfebe349 0x950f2f2 0xd8b8653 0x9ce090b 0x6ab20fb 0x2b3d6498c555 0x4e56560

这应该是主机异常断电后,导致clog日志损坏了。 observer在重启时,需要通过clog日志来恢复内存中未落盘的数据,而clog日志损坏,所以observer无法正常启动。

这就相当于oracle数据库,异常断电后,redo损坏会导致oracle实例无法启动一样的道理。