集群所有节点故障离线后再启动,show database报错,感觉跟没有初始化集群一样

【 使用环境 】 测试环境
【 OB or 其他组件 】OB
【 使用版本 】V3
【问题描述】清晰明确描述问题
【复现路径】问题出现前后相关操作
【问题现象及影响】

虚拟机

使用OCP部署的OB集群(1-1-1)前一天由于网络或主机问题,导致3台OBserver离线。大概过了一天左右,observer主机上线并能远程ssh登录连接,然后ssh过去查看每台主机的时钟同步没问题后,手工启动observer进程。每台主机成功启动observer进程后,尝试登录OB,竟然不需要密码!!!感觉集群未初始化一样,但所有数据文件又都存在,

日志目录:

[root@OLMSH log]# ls |wc -l
601
[root@OLMSH log]# du -h
76G .
[root@OLMSH log]# ls -lt|more
total 79208684
-rw-r–r-- 1 admin admin 85174502 Jul 6 11:38 obesi-daemon.log
-rw-r–r-- 1 admin admin 106645551 Jul 6 11:38 observer.log
-rw-r–r-- 1 admin admin 1719734 Jul 6 11:38 observer.log.wf
-rw-r–r-- 1 admin admin 168834953 Jul 6 11:38 election.log
-rw-r–r-- 1 admin admin 81137146 Jul 6 11:38 rootservice.log
-rw-r–r-- 1 admin admin 268435791 Jul 6 11:36 observer.log.20230706113652099
-rw-r–r-- 1 admin admin 16346 Jul 6 11:36 observer.log.wf.20230706113652099
-rw-r–r-- 1 admin admin 268439580 Jul 6 11:35 observer.log.20230706113505875
-rw-r–r-- 1 admin admin 294 Jul 6 11:33 observer.log.wf.20230706113505875
-rw-r–r-- 1 admin admin 268443760 Jul 6 11:33 observer.log.20230706113336168
-rw-r–r-- 1 admin admin 294 Jul 6 11:32 observer.log.wf.20230706113336168
-rw-r–r-- 1 admin admin 268435481 Jul 6 11:32 observer.log.20230706113208721
-rw-r–r-- 1 admin admin 410317 Jul 6 11:31 observer.log.wf.20230706113208721
-rw-r–r-- 1 admin admin 268435683 Jul 6 11:30 observer.log.20230706113053011
-rw-r–r-- 1 admin admin 268436618 Jul 6 11:29 observer.log.20230706112928330
-rw-r–r-- 1 admin admin 294 Jul 6 11:29 observer.log.wf.20230706113053011
-rw-r–r-- 1 admin admin 2532525 Jul 6 11:28 observer.log.wf.20230706112928330

没法查看日志内容,一查看某个日志文件就直接刷屏卡死,tail 也依然不停的刷!!

登录OB,show database失败如下:感觉没初始化一样,因为之前设置的登录密码也变为空了

show database报错

最后只能kill掉observer进程,然后再查看日志如下(部分),又是4338内部错误,分配内存失败

[2023-07-06 11:38:38.808142] ERROR issue_dba_error (ob_log.cpp:2322) [29864][0][Y0-0000000000000000-0-0] [lt=42216] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“achunk_mgr.cpp”, line_no=127, info=“low alloc fail”)
[2023-07-06 11:38:38.808202] ERROR issue_dba_error (ob_log.cpp:2322) [29864][0][Y0-0000000000000000-0-0] [lt=23] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_malloc.h”, line_no=54, info=“allocate memory fail”)
[2023-07-06 11:38:38.849800] ERROR issue_dba_error (ob_log.cpp:2322) [29884][0][Y0-0000000000000000-0-0] [lt=21] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_log_direct_reader.cpp”, line_no=69, info=“ob_malloc fail”)
[2023-07-06 11:38:38.857308] ERROR issue_dba_error (ob_log.cpp:2322) [29864][0][Y0-0000000000000000-0-0] [lt=10] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“achunk_mgr.cpp”, line_no=127, info=“low alloc fail”)
[2023-07-06 11:38:38.857341] ERROR issue_dba_error (ob_log.cpp:2322) [29864][0][Y0-0000000000000000-0-0] [lt=18] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_malloc.h”, line_no=54, info=“allocate memory fail”)
[2023-07-06 11:38:38.874231] ERROR issue_dba_error (ob_log.cpp:2322) [29864][0][Y0-0000000000000000-0-0] [lt=7] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“achunk_mgr.cpp”, line_no=127, info=“low alloc fail”)
[2023-07-06 11:38:38.874274] ERROR issue_dba_error (ob_log.cpp:2322) [29864][0][Y0-0000000000000000-0-0] [lt=18] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_malloc.h”, line_no=54, info=“allocate memory fail”)
[2023-07-06 11:38:38.892255] ERROR issue_dba_error (ob_log.cpp:2322) [29864][0][Y0-0000000000000000-0-0] [lt=7] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“achunk_mgr.cpp”, line_no=127, info=“low alloc fail”)
[2023-07-06 11:38:39.388893] ERROR issue_dba_error (ob_log.cpp:2322) [29864][0][Y0-0000000000000000-0-0] [lt=58] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_malloc.h”, line_no=54, info=“allocate memory fail”)
[2023-07-06 11:38:39.772786] ERROR issue_dba_error (ob_log.cpp:2322) [29864][0][Y0-0000000000000000-0-0] [lt=146548] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“achunk_mgr.cpp”, line_no=127, info=“low alloc fail”)
[2023-07-06 11:38:40.893676] ERROR issue_dba_error (ob_log.cpp:2322) [29864][0][Y0-0000000000000000-0-0] [lt=171693] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_malloc.h”, line_no=54, info=“allocate memory fail”)
[2023-07-06 11:38:41.256685] ERROR issue_dba_error (ob_log.cpp:2322) [29864][0][Y0-0000000000000000-0-0] [lt=53915] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“achunk_mgr.cpp”, line_no=127, info=“low alloc fail”)
[2023-07-06 11:38:42.318609] ERROR issue_dba_error (ob_log.cpp:2322) [29884][0][Y0-0000000000000000-0-0] [lt=318110] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“achunk_mgr.cpp”, line_no=127, info=“low alloc fail”)
[2023-07-06 11:38:42.506838] ERROR issue_dba_error (ob_log.cpp:2322) [29864][0][Y0-0000000000000000-0-0] [lt=31175] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_malloc.h”, line_no=54, info=“allocate memory fail”)
[2023-07-06 11:38:43.629141] ERROR issue_dba_error (ob_log.cpp:2322) [29864][0][Y0-0000000000000000-0-0] [lt=417806] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“achunk_mgr.cpp”, line_no=127, info=“low alloc fail”)
[2023-07-06 11:38:44.228812] ERROR issue_dba_error (ob_log.cpp:2322) [29884][0][Y0-0000000000000000-0-0] [lt=50688] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_malloc.h”, line_no=54, info=“allocate memory fail”)

大部分都被写日志的操作把内存给耗没了,关键是为什么会不停的写日志??同一个报错,永不休止的刷屏!

ob的日志一直刷,不知道为啥! 日志太烂了

其他数据库的日志也是这样吗

show database报错
原因是:数据库没初始化完成
根据日志错误码4013:内存分配失败。
可能
物理内存耗尽。
单次分配内存大于 4G。
CTX、Tenant、Server 的内存占用达到上限。

有完整的ob日志和observer日志吗

同意你的说法,但是我想问,故障后的3节点集群重新启动上线后,为什么要初始化?那初始化的方法如何操作呢?初始化前后的数据该怎么处理呢?

老师,完整的OBserver日志有601个 :joy:完整的ob日志也好几十个,我没法弄错来贴上 :rofl:

ob配置文件贴一下 strings /home/admin/oceanbase/etc/observer.config.bin

[root@abcSH etc]# strings observer.config.bin
-O)>
_enable_oracle_priv_check=True
stack_size=512k
major_compact_trigger=100
all_server_list=192.168.10.172:2882,192.168.10.173:2882,192.168.10.174:2882
__min_full_resource_pool_memory=1073741824
min_observer_version=3.2.4.1
workers_per_cpu_quota=4
cache_wash_threshold=4GB
syslog_level=WDIAG
obconfig_url=http://192.168.10.175:8080/services?Action=ObRootServiceInfo&User_ID=alibaba&UID=ocpmaster&ObRegion=xxxcluster
cluster_id=1688453679
cluster=xxxcluster
rootservice_list=192.168.10.174:2882:2881;192.168.10.172:2882:2881;192.168.10.173:2882:2881
_partition_balance_strategy=standard
enable_one_phase_commit=False
cpu_count=16
system_memory=3G
memory_limit=15G
net_thread_count=4
zone=zone1
devname=ens192
mysql_port=2881
rpc_port=2882
config_additional_dir=/data/log1/xxxcluster/etc2
data_dir=/home/admin/oceanbase/store/xxxcluster
[root@abcSH etc]#

min_observer_version=3.2.4.1 看是企业版ob
建议您通过以下方式寻求帮助:
1.如您所在的企业客户已签署OceanBase企业版销售合同,请您联系客户经理;
2.如您所在的企业客户尚未签署OceanBase企业版销售合同,您可通过OceanBase官网商务咨询页面留下您的联系方式,OceanBase企业版的业务顾问会在一个工作日内与您联系。
https://www.oceanbase.com/product/oceanbase