告警详情:[OBServer 合并失败]

【 使用环境 】测试环境
【 OB or 其他组件 】OB
【 使用版本 】4.3.4
【问题描述】告警详情:[OBServer 合并失败] 集群:sgkyocp,主机:10...*,日志类型:observer,日志文件:/root/oceanbase/log/observer.log,日志级别:WDIAG,关键字=failed to merge partition,错误码=4034,日志详情=[2025-03-05 14:47:48.117625] WDIAG [STORAGE] process (ob_tablet_merge_task.cpp:1604) [176083][T1004_MINOR_EXE][T1004][YB420A0B093E-00062F90BFEFD493-0-0] [lt=3][errcode=-4034] failed to merge partition(ret=-4034)。

日志文件内容:
[2025-03-05 14:27:22.675723] ERROR issue_dba_error (ob_log.cpp:1875) [176014][T1004_DiskCB][T1004][YB420A0B093E-00062F90C11FDCAB-0-0] [lt=40][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4016, file=“ob_micro_block_cache.cpp”, line_no=342, info=“Fail to deserialize record header”)
[2025-03-05 14:27:22.675737] EDIAG [STORAGE] process_block (ob_micro_block_cache.cpp:342) [176014][T1004_DiskCB][T1004][YB420A0B093E-00062F90C11FDCAB-0-0] [lt=14][errcode=-4016] Fail to deserialize record header(ret=-4016, block_id=197, offset=1889087) BACKTRACE:0x12435d5c 0x5069065 0x51af008 0x51aeb1b 0x51aea5e 0x51ae886 0xe92595f 0x5082f33 0x5081d05 0x127090f1 0x127054b0 0x7f0e69c081ca 0x7f0e698398d3

麻烦先使用obdiag进行日志分析,发一下结果。再把该日志也发出来一份
obdiag analyze log --files observer.log.xxxxxxxxx

select * from dba_ob_major_compaction;
select * from gv$ob_tablet_compaction_progress; 从svr_ip 角度 和表角度分析下 是一个observer的问题 还是 ,卡到那个表上了。 可以从create_time看看 那个tablet_id 消耗时间最长

select svr_ip,count(1) from gv$ob_tablet_compaction_progress group by svr_ip;
select tenant_id,ls_id,tablet_id,now() - start_time from gv$ob_tablet_compaction_progress order by 4 desc;

Analyze OceanBase Offline Log Summary:
±----------±----------±--------------------------------------------------------------------------------±---------------------------±------------±------------------±--------+
| Node | Status | FileName | First Found Time | ErrorCode | Message | Count |
+===========+===========+=================================================================================+============================+=============+===================+=========+
| 127.0.0.1 | Completed | /root/obdiag_analyze_pack_20250305152413/local/_root_oceanbase_log_observer.log | 2025-03-05 15:22:08.232304 | -4016 | Internal error | 29 |
±----------±----------±--------------------------------------------------------------------------------±---------------------------±------------±------------------±--------+
| 127.0.0.1 | Completed | /root/obdiag_analyze_pack_20250305152413/local/_root_oceanbase_log_observer.log | 2025-03-05 15:22:20.427200 | -4034 | Deserialize error | 6 |
±----------±----------±--------------------------------------------------------------------------------±---------------------------±------------±------------------±--------+

Details:

Node: 127.0.0.1
Status: Completed
FileName: /root/obdiag_analyze_pack_20250305152413/local/_root_oceanbase_log_observer.log
First Found Time: 2025-03-05 15:22:08.232304
ErrorCode: -4016
Message: Internal error
Count: 29
Last Found Time: 2025-03-05 15:24:13.238316
Cause: Internal Error
Solution: Contact OceanBase Support
Trace_IDS: [‘B420A0B093E-00062F90BFEFD5D4-0-0’, ‘B420A0B093E-00062F90BFEFD5D5-0-0’, ‘B420A0B093E-00062F90BFEFD5D6-0-0’, ‘B420A0B093E-00062F90BFEFD5D7-0-0’, ‘B420A0B093E-00062F90BFEFD5D8-0-0’, ‘B420A0B093E-00062F90BFEFD5D9-0-0’, ‘B420A0B093E-00062F90BFEFD5DA-0-0’, ‘B420A0B093E-00062F90BFEFD5DB-0-0’, ‘B420A0B093E-00062F90BFEFD5DC-0-0’, ‘B420A0B093E-00062F90BFEFD5DD-0-0’, ‘B420A0B093E-00062F90BFEFD5DE-0-0’, ‘B420A0B093E-00062F90BFEFD5DF-0-0’, ‘B420A0B093E-00062F90BFEFD5E0-0-0’, ‘B420A0B093E-00062F90BFEFD5E1-0-0’, ‘B420A0B093E-00062F90C11FE54F-0-0’, ‘B420A0B093E-00062F90C11FE550-0-0’, ‘B420A0B093E-00062F90BFEFD5E2-0-0’, ‘B420A0B0940-00062F90BE22DC50-0-0’, ‘B420A0B093E-00062F90BFEFD5E3-0-0’, ‘B420A0B093E-00062F90BFEFD5E4-0-0’, ‘B420A0B093E-00062F90BFEFD5E5-0-0’, ‘B420A0B093E-00062F90BFEFD5E6-0-0’, ‘B420A0B093E-00062F90BFEFD5E7-0-0’, ‘B420A0B093E-00062F90BFEFD5E9-0-0’, ‘B420A0B093E-00062F90BFEFD5EA-0-0’, ‘B420A0B093E-00062F90BFEFD5EB-0-0’, ‘B420A0B093E-00062F90BFEFD5EC-0-0’, ‘B420A0B093E-00062F90BFEFD5ED-0-0’, ‘B420A0B093E-00062F90BFEFD5EE-0-0’]

Node: 127.0.0.1
Status: Completed
FileName: /root/obdiag_analyze_pack_20250305152413/local/_root_oceanbase_log_observer.log
First Found Time: 2025-03-05 15:22:20.427200
ErrorCode: -4034
Message: Deserialize error
Count: 6
Last Found Time: 2025-03-05 15:23:50.460939
Cause: Internal Error
Solution: Contact OceanBase Support
Trace_IDS: [‘0-0000000000000000-0-0’, ‘B420A0B093E-00062F90BFEFD5E8-0-0’]

麻烦提供一份完整的observer日志附件。
看一下是哪个租户合并失败select * from CDB_OB_MAJOR_COMPACTION;

磁盘是否出现过问题,比如磁盘hang? dmesg -T 看下
另外可以使用obdiag巡检下,将结果打包发下

obdiag check run
https://www.oceanbase.com/docs/common-obdiag-cn-1000000002200479

这边看你采纳了,请问合并问题当前解决了么。可以分析下解决方法

这个警告后来不出现了

应该是合并卡住延长问题恢复了。