observer节点负载非常大,如何排查问题

【 使用环境 】测试环境
【 OB or 其他组件 】OB
【 使用版本 】4.3.1
【问题描述】observer节点负载非常大,如何排查问题
observer线程情况如下:

测试环境

硬件配置:6台 16C 32G内存
系统版本:CentOS Linux release 7.9.2009 (Core)

创建的是3个ZONE,每个ZONE 2个observer,现在每个zone的第一个节点的负载不高,但第二台的节点负载不高。
observer.log.zip (4.1 MB)

【复现路径】问题出现前后相关操作
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):

【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

使用obd部署的吗?你指的负载高是什么,cpu?内存?io?

是使用OBD部署。 负载高指的是CPU load高, 内存和IO还算稳定

这个集群有跑测试、业务之类的吗

用obdiag 巡检一下看看:【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

没有跑义务,我就是早上用sysbench 跑了10张表,每张表100万记录,后面有全部删除了 。 现在OCP express的监控数据都出不来,所以我不确定这个负载高是不是在我做这些之前就已经这样了。

你用上面的obdiag跑一下巡检,检查一下集群

另外,1003是什么租户?

是1004 租户的meta账号

生成的文件太大,上传到阿里云盘也无法分享

阿里云盘地址:gather_pack_20240611155731
https://www.alipan.com/s/UucunmV8oiW
点击链接保存,或者复制本段内容,打开「阿里云盘」APP ,无需下载极速在线查看,视频原画倍速播放。

Gather Ob Log Summary:
±-----------±----------±---------±-------±------------------------------------------------------------------------------------+
| Node | Status | Size | Time | PackPath |
+============+===========+==========+========+=====================================================================================+
| 10.0.22.41 | Completed | 116.236M | 26 s | /tmp/gather_pack_20240611155731/ob_log_10.0.22.41_20240611150000_20240611153000.zip |
±-----------±----------±---------±-------±------------------------------------------------------------------------------------+
| 10.0.22.42 | Completed | 73.299M | 16 s | /tmp/gather_pack_20240611155731/ob_log_10.0.22.42_20240611150000_20240611153000.zip |
±-----------±----------±---------±-------±------------------------------------------------------------------------------------+
| 10.0.22.43 | Completed | 73.264M | 22 s | /tmp/gather_pack_20240611155731/ob_log_10.0.22.43_20240611150000_20240611153000.zip |
±-----------±----------±---------±-------±------------------------------------------------------------------------------------+
| 10.0.22.45 | Completed | 110.961M | 303 s | /tmp/gather_pack_20240611155731/ob_log_10.0.22.45_20240611150000_20240611153000.zip |
±-----------±----------±---------±-------±------------------------------------------------------------------------------------+
| 10.0.22.46 | Completed | 134.011M | 536 s | /tmp/gather_pack_20240611155731/ob_log_10.0.22.46_20240611150000_20240611153000.zip |
±-----------±----------±---------±-------±------------------------------------------------------------------------------------+
| 10.0.22.47 | Completed | 78.651M | 64 s | /tmp/gather_pack_20240611155731/ob_log_10.0.22.47_20240611150000_20240611153000.zip |
±-----------±----------±---------±-------±------------------------------------------------------------------------------------+
Gather Sysstat Summary:
±-----------±----------±--------±-------±----------------------------------------------------------------------+
| Node | Status | Size | Time | PackPath |
+============+===========+=========+========+=======================================================================+
| 10.0.22.41 | Completed | 38.202K | 0 s | /tmp/gather_pack_20240611155731/sysstat_10.0.22.41_20240611161341.zip |
±-----------±----------±--------±-------±----------------------------------------------------------------------+
| 10.0.22.42 | Completed | 38.239K | 0 s | /tmp/gather_pack_20240611155731/sysstat_10.0.22.42_20240611161342.zip |
±-----------±----------±--------±-------±----------------------------------------------------------------------+
| 10.0.22.43 | Completed | 38.607K | 0 s | /tmp/gather_pack_20240611155731/sysstat_10.0.22.43_20240611161343.zip |
±-----------±----------±--------±-------±----------------------------------------------------------------------+
| 10.0.22.45 | Completed | 38.119K | 8 s | /tmp/gather_pack_20240611155731/sysstat_10.0.22.45_20240611161344.zip |
±-----------±----------±--------±-------±----------------------------------------------------------------------+
| 10.0.22.46 | Completed | 38.124K | 13 s | /tmp/gather_pack_20240611155731/sysstat_10.0.22.46_20240611161352.zip |
±-----------±----------±--------±-------±----------------------------------------------------------------------+
| 10.0.22.47 | Completed | 38.107K | 4 s | /tmp/gather_pack_20240611155731/sysstat_10.0.22.47_20240611161405.zip |
±-----------±----------±--------±-------±----------------------------------------------------------------------+
Gather Perf Summary:
±-----------±----------±--------±-------±-------------------------------------------------------------------+
| Node | Status | Size | Time | PackPath |
+============+===========+=========+========+====================================================================+
| 10.0.22.41 | Completed | 9.226K | 0 s | /tmp/gather_pack_20240611155731/perf_10.0.22.41_20240611161409.zip |
±-----------±----------±--------±-------±-------------------------------------------------------------------+
| 10.0.22.42 | Completed | 8.573K | 0 s | /tmp/gather_pack_20240611155731/perf_10.0.22.42_20240611161410.zip |
±-----------±----------±--------±-------±-------------------------------------------------------------------+
| 10.0.22.43 | Completed | 8.812K | 1 s | /tmp/gather_pack_20240611155731/perf_10.0.22.43_20240611161411.zip |
±-----------±----------±--------±-------±-------------------------------------------------------------------+
| 10.0.22.45 | Completed | 10.163K | 12 s | /tmp/gather_pack_20240611155731/perf_10.0.22.45_20240611161413.zip |
±-----------±----------±--------±-------±-------------------------------------------------------------------+
| 10.0.22.46 | Completed | 10.328K | 17 s | /tmp/gather_pack_20240611155731/perf_10.0.22.46_20240611161425.zip |
±-----------±----------±--------±-------±-------------------------------------------------------------------+
| 10.0.22.47 | Completed | 9.656K | 5 s | /tmp/gather_pack_20240611155731/perf_10.0.22.47_20240611161442.zip |
±-----------±----------±--------±-------±-------------------------------------------------------------------+
Gather Ob stack Summary:
±-----------±----------±--------±-------±-----------------------------------------------------------------------+
| Node | Status | Size | Time | PackPath |
+============+===========+=========+========+========================================================================+
| 10.0.22.41 | Completed | 12.928K | 3 s | /tmp/gather_pack_20240611155731/obstack2_10.0.22.41_20240611161448.zip |
±-----------±----------±--------±-------±-----------------------------------------------------------------------+
| 10.0.22.42 | Completed | 11.584K | 2 s | /tmp/gather_pack_20240611155731/obstack2_10.0.22.42_20240611161451.zip |
±-----------±----------±--------±-------±-----------------------------------------------------------------------+
| 10.0.22.43 | Completed | 11.423K | 4 s | /tmp/gather_pack_20240611155731/obstack2_10.0.22.43_20240611161454.zip |
±-----------±----------±--------±-------±-----------------------------------------------------------------------+
| 10.0.22.45 | Completed | 12.479K | 44 s | /tmp/gather_pack_20240611155731/obstack2_10.0.22.45_20240611161458.zip |
±-----------±----------±--------±-------±-----------------------------------------------------------------------+
| 10.0.22.46 | Completed | 13.274K | 92 s | /tmp/gather_pack_20240611155731/obstack2_10.0.22.46_20240611161543.zip |
±-----------±----------±--------±-------±-----------------------------------------------------------------------+
| 10.0.22.47 | Completed | 11.724K | 17 s | /tmp/gather_pack_20240611155731/obstack2_10.0.22.47_20240611161716.zip |
±-----------±----------±--------±-------±-----------------------------------------------------------------------+

result_summary.txt (7.2 KB)

sysstat_10.0.22.45_20240611161344.zip (38.1 KB)
obstack2_10.0.22.45_20240611161458.zip (12.5 KB)
perf_10.0.22.45_20240611161413.zip (10.2 KB)

使用 obd obdiag analyze log obdemo 分析了ob的日志,结果如下:

[root@ob01 analyze_pack_20240611174116]# cat result_details.txt

Analyze OceanBase Online Log Summary:
±-----------±---------±-----------±------------±----------±--------+
| Node | Status | FileName | ErrorCode | Message | Count |
+============+==========+============+=============+===========+=========+
| 10.0.22.41 | PASS | | | | |
±-----------±---------±-----------±------------±----------±--------+
| 10.0.22.42 | PASS | | | | |
±-----------±---------±-----------±------------±----------±--------+
| 10.0.22.43 | PASS | | | | |
±-----------±---------±-----------±------------±----------±--------+
| 10.0.22.45 | PASS | | | | |
±-----------±---------±-----------±------------±----------±--------+
| 10.0.22.46 | PASS | | | | |
±-----------±---------±-----------±------------±----------±--------+
| 10.0.22.47 | PASS | | | | |
±-----------±---------±-----------±------------±----------±--------+

Details:

Node: 10.0.22.41
Status: PASS
FileName: None
ErrorCode: None
Message: None
Count: None
Cause: None
Solution: None
First Found Time: None
Last Found Time: None
Trace_IDS: None

Node: 10.0.22.42
Status: PASS
FileName: None
ErrorCode: None
Message: None
Count: None
Cause: None
Solution: None
First Found Time: None
Last Found Time: None
Trace_IDS: None

Node: 10.0.22.43
Status: PASS
FileName: None
ErrorCode: None
Message: None
Count: None
Cause: None
Solution: None
First Found Time: None
Last Found Time: None
Trace_IDS: None

Node: 10.0.22.45
Status: PASS
FileName: None
ErrorCode: None
Message: None
Count: None
Cause: None
Solution: None
First Found Time: None
Last Found Time: None
Trace_IDS: None

Node: 10.0.22.46
Status: PASS
FileName: None
ErrorCode: None
Message: None
Count: None
Cause: None
Solution: None
First Found Time: None
Last Found Time: None
Trace_IDS: None

Node: 10.0.22.47
Status: PASS
FileName: None
ErrorCode: None
Message: None
Count: None
Cause: None
Solution: None
First Found Time: None
Last Found Time: None
Trace_IDS: None

具体是哪个节点负载高

10.0.22.45、46、47 这三个节点

巡检出来,45节点的top信息看起来还好?负载不是很高呀

sysbench压测多少?

还有两个节点负载都100+。这个时候整个OCP express 的监控数据都出来很慢,一条单纯的set global 设置参数 都需要20+ 秒,肯定是不正常的