obd部署3节点集群,运行几个小时后observer陆续崩溃CRASH ERROR!!!

【 使用环境 】 obd方式部署的测试环境
【 OB 】observer
【 使用版本 】4.2.1.0
【问题描述】运行几个小时后 observer节点陆续崩溃
【复现路径】无任何操作
【问题现象及影响】
其他服务都正常运行,只有observer停止运行
【附件】


崩溃错误日志

227节点 19:24:19
CRASH ERROR!!! IP=55c74271a704, RBP=7fab4fd51180, sig=11, sig_code=128, sig_addr=0, RLIMIT_CORE=unlimited, timestamp=1700568721714087, tid=29770, tname=RootBalance, trace_id=12381328008163-1700563206626540-0-0, extra_info=((null)), lbt=0x11671ea8 0x11657478 0x7fab6e14862f 0x4995704 0x7d28bc9 0x7b2535f 0x7c98fa9 0x7c98407 0x7c97da2 0x1165e643 0x1165df1d 0x11669430 0x116659f1 0x7fab6e140ea4 0x7fab6de69b0c, SQL=
228节点 20:12:01
CRASH ERROR!!! IP=557726d7120e, RBP=7f24cab4b550, sig=11, sig_code=128, sig_addr=0, RLIMIT_CORE=unlimited, timestamp=1700571032059303, tid=13874, tname=T1_L0_G100, trace_id=12381328008164-1700563201486885-0-0, extra_info=((null)), lbt=0x11671ea8 0x11657478 0x7f258033462f 0x1165520e 0x49cde93 0x49cdd31 0x4c425a4 0xf56a135 0xf569f2f 0x95a7f4e 0x95a748d 0x95ae9c6 0x95b0666 0x4c74bdc 0x4c7476b 0x4c737d3 0x4c735d1 0x4c73560 0x4acb764 0xb514475 0xb586d44 0x4cb01fb 0x4acb764 0xb94920e, SQL=select /* MONITOR_AGENT */ case when cnt is null then 0 else cnt end as cnt, tenant_name, tenant_id from (select DBA_OB_TENANTS.tenant_name, DBA_OB_TENANTS.tenant_id, cnt from DBA_OB_TENANTS left join (select count(state=‘ACTIVE’ OR NULL) as cnt, tenant as tenant_name from GV$OB_PROCESSLIST where svr_ip = ‘192.168.99.228’ and svr_port = 2882 group by tenant) t1 on DBA_OB_TENANTS.tenant_name = t1.tenant_name where DBA_OB_TENANTS.tenant_type<>‘META’) t2
229节点 20:50:32
CRASH ERROR!!! IP=5583f3bd0dbf, RBP=7f7b20152390, sig=11, sig_code=1, sig_addr=f0bf10, RLIMIT_CORE=unlimited, timestamp=1700565859008382, tid=5207, tname=T1002_L0_G0, trace_id=12381328008164-1700563193126794-0-0, extra_info=((null)), lbt=0x11671ea8 0x11657478 0x7f7b8f5c262f 0xa1c9dbf 0x48a85eb 0x9781616 0x488c43f 0x4888594 0x4883943 0x92a13b3 0x116659be 0x7f7b8f5baea4 0x7f7b8f2e396c, SQL=

麻烦同学贴下core的堆栈。堆栈的获取方式是进入observer的安装目录,看下是否存在以“core-”为开头的文件,然后执行 gdb ./bin/observer ${core文件名} ,等解析后再输入bt和回车,便能获取对应的堆栈

cat /etc/security/limits.conf 看下,确认下信息

(gdb) bt
#0 0x00007f72f278a4fb in ?? ()
#1 0x000056356491a76e in ?? ()
#2 0x5245204853415243 in ?? ()
#3 0x4920212121524f52 in ?? ()
#4 0x3735353336353d50 in ?? ()
#5 0x202c323736643063 in ?? ()
#6 0x323766373d504252 in ?? ()
#7 0x3032316334643639 in ?? ()
#8 0x31313d676973202c in ?? ()
#9 0x6f635f676973202c in ?? ()
#10 0x202c3832313d6564 in ?? ()
#11 0x726464615f676973 in ?? ()
#12 0x4d494c52202c303d in ?? ()
#13 0x3d45524f435f5449 in ?? ()
#14 0x6574696d696c6e75 in ?? ()
#15 0x73656d6974202c64 in ?? ()
#16 0x3037313d706d6174 in ?? ()
#17 0x3936373331363530 in ?? ()
#18 0x74202c3833353934 in ?? ()
#19 0x2c393735383d6469 in ?? ()
#20 0x543d656d616e7420 in ?? ()
#21 0x5f304c5f32303031 in ?? ()
#22 0x63617274202c3047 in ?? ()
#23 0x3332313d64695f65 in ?? ()
#24 0x3830303832333138 in ?? ()
#25 0x303037312d333631 in ?? ()
#26 0x3232323935383535 in ?? ()
#27 0x302d302d38363033 in ?? ()
#28 0x5f6172747865202c in ?? ()
#29 0x6e28283d6f666e69 in ?? ()
#30 0x6c202c29296c6c75 in ?? ()
#31 0x36313178303d7462 in ?? ()
#32 0x7830203861653137 in ?? ()
#33 0x3837343735363131 in ?? ()
#34 0x6632376637783020 in ?? ()
#35 0x2066323661383732 in ?? ()
#36 0x3736613439347830 in ?? ()
#37 0x3834393478302032 in ?? ()
#38 0x3934783020353362 in ?? ()
#39 0x7830203037313734 in ?? ()
—Type to continue, or q to quit—

麻烦再试下addr2line -ipCfe bin/observer 0x11671ea8 0x11657478 0x7f7b8f5c262f 0xa1c9dbf 0x48a85eb 0x9781616 0x488c43f 0x4888594 0x4883943 0x92a13b3 0x116659be 0x7f7b8f5baea4 0x7f7b8f2e396c

同学麻烦稍等,我去找同学对接下,刚刚那个堆栈信息是在229节点那里获取的么

这个是227节点,我看了每个节点都有core-文件,而且都有多个 core开头的文件



看下rpm包具体的版本, 在 oceanbase-community-stable-el-7-x86_64安装包下载_开源镜像站-阿里云
找一下对应的 oceanbase-ce-debuginfo-4.2.1.0-xxx包。下载解压rpm包会有一个observer.debug文件,将将observer.debug文件copy 到observer二进制文件的同级目录,然后再执行下之前的

addr2line -ipCfe bin/observer 0x11671ea8 0x11657478 0x7f7b8f5c262f 0xa1c9dbf 0x48a85eb 0x9781616 0x488c43f 0x4888594 0x4883943 0x92a13b3 0x116659be 0x7f7b8f5baea4 0x7f7b8f2e396c

版本
image
是放到这个位置吗

bin目录下,observer.debug和bin目录下面的observer同一个目录

信息量有点少,gdb /bin/observer core-T1002xxx这个文件看看,进入交互页面后输入bt

#0 0x00007f72f278a4fb in ?? ()
#1 0x000056356491a76e in ?? ()
#2 0x5245204853415243 in ?? ()
#3 0x4920212121524f52 in ?? ()
#4 0x3735353336353d50 in ?? ()
#5 0x202c323736643063 in ?? ()
#6 0x323766373d504252 in ?? ()
#7 0x3032316334643639 in ?? ()
#8 0x31313d676973202c in ?? ()
#9 0x6f635f676973202c in ?? ()
#10 0x202c3832313d6564 in ?? ()
#11 0x726464615f676973 in ?? ()
#12 0x4d494c52202c303d in ?? ()
#13 0x3d45524f435f5449 in ?? ()
#14 0x6574696d696c6e75 in ?? ()
#15 0x73656d6974202c64 in ?? ()
#16 0x3037313d706d6174 in ?? ()
#17 0x3936373331363530 in ?? ()
#18 0x74202c3833353934 in ?? ()
#19 0x2c393735383d6469 in ?? ()
#20 0x543d656d616e7420 in ?? ()
#21 0x5f304c5f32303031 in ?? ()
#22 0x63617274202c3047 in ?? ()
#23 0x3332313d64695f65 in ?? ()
#24 0x3830303832333138 in ?? ()
#25 0x303037312d333631 in ?? ()
#26 0x3232323935383535 in ?? ()
#27 0x302d302d38363033 in ?? ()
#28 0x5f6172747865202c in ?? ()
#29 0x6e28283d6f666e69 in ?? ()
#30 0x6c202c29296c6c75 in ?? ()
#31 0x36313178303d7462 in ?? ()
#32 0x7830203861653137 in ?? ()
#33 0x3837343735363131 in ?? ()
#34 0x6632376637783020 in ?? ()
#35 0x2066323661383732 in ?? ()
#36 0x3736613439347830 in ?? ()
#37 0x3834393478302032 in ?? ()
#38 0x3934783020353362 in ?? ()
#39 0x7830203037313734 in ?? ()
—Type to continue, or q to quit—
#40 0x2039363664333934 in ?? ()
#41 0x6631316238347830 in ?? ()
#42 0x3739383478302066 in ?? ()
#43 0x3834783020396362 in ?? ()
#44 0x7830206537653139 in ?? ()
#45 0x2061393438383834 in ?? ()
#46 0x3439333838347830 in ?? ()
#47 0x3161323978302033 in ?? ()
#48 0x3131783020336233 in ?? ()
#49 0x3020656239353636 in ?? ()
#50 0x3732663237663778 in ?? ()
#51 0x7830203461653238 in ?? ()
#52 0x6134326632376637 in ?? ()
#53 0x0000202c63306262 in ?? ()
#54 0x0000000000000000 in ?? ()

该问题已单独拉用户群沟通跟进