obd部署集群失败

【 使用环境 】测试环境
【 OB or 其他组件 】
【 使用版本 】4.0.0-ce
【问题描述】

使用OBD部署集群失败,不确定是不是受WARN的那几条影响,有办法看到具体原因吗?

[ob@localhost ~]$ obd cluster autodeploy obtest -c ./topology.yaml
install oceanbase-ce-4.0.0.0 for local ok
install obproxy-ce-4.0.0 for local ok
install obagent-1.2.0 for local ok
install prometheus-2.37.1 for local ok
install grafana-7.5.17 for local ok
Cluster param config check ok
Open ssh connection ok
Generate observer configuration ok
Generate obproxy configuration ok
Generate obagent configuration ok
Generate prometheus configuration ok
Generate grafana configuration ok
install oceanbase-ce-4.0.0.0 for local ok
install obproxy-ce-4.0.0 for local ok
install obagent-1.2.0 for local ok
install prometheus-2.37.1 for local ok
install grafana-7.5.17 for local ok
+--------------------------------------------------------------------------------------------+
|                                          Packages                                          |
+--------------+---------+------------------------+------------------------------------------+
| Repository   | Version | Release                | Md5                                      |
+--------------+---------+------------------------+------------------------------------------+
| oceanbase-ce | 4.0.0.0 | 103000022023011215.el7 | 1d56dc742f5f05a2d15797d291b51a94019e728d |
| obproxy-ce   | 4.0.0   | 5.el7                  | de53232a951184fad75b15884458d85e31d2f6c3 |
| obagent      | 1.2.0   | 4.el7                  | 0e8f5ee68c337ea28514c9f3f820ea546227fa7e |
| prometheus   | 2.37.1  | 10000102022110211.el7  | 58913c7606f05feb01bc1c6410346e5fc31cf263 |
| grafana      | 7.5.17  | 1                      | 1bf1f338d3a3445d8599dc6902e7aeed4de4e0d6 |
+--------------+---------+------------------------+------------------------------------------+
Repository integrity check ok
Parameter check ok
Open ssh connection ok
Cluster status check ok
Initializes observer work home ok
Initializes obproxy work home ok
Initializes obagent work home ok
Initializes prometheus work home ok
Initializes grafana work home ok
Remote oceanbase-ce-4.0.0.0-103000022023011215.el7-1d56dc742f5f05a2d15797d291b51a94019e728d repository install ok
Remote oceanbase-ce-4.0.0.0-103000022023011215.el7-1d56dc742f5f05a2d15797d291b51a94019e728d repository lib check !!
Remote obproxy-ce-4.0.0-5.el7-de53232a951184fad75b15884458d85e31d2f6c3 repository install ok
Remote obproxy-ce-4.0.0-5.el7-de53232a951184fad75b15884458d85e31d2f6c3 repository lib check ok
Remote obagent-1.2.0-4.el7-0e8f5ee68c337ea28514c9f3f820ea546227fa7e repository install ok
Remote obagent-1.2.0-4.el7-0e8f5ee68c337ea28514c9f3f820ea546227fa7e repository lib check ok
Remote prometheus-2.37.1-10000102022110211.el7-58913c7606f05feb01bc1c6410346e5fc31cf263 repository install ok
Remote prometheus-2.37.1-10000102022110211.el7-58913c7606f05feb01bc1c6410346e5fc31cf263 repository lib check ok
Remote grafana-7.5.17-1-1bf1f338d3a3445d8599dc6902e7aeed4de4e0d6 repository install ok
Remote grafana-7.5.17-1-1bf1f338d3a3445d8599dc6902e7aeed4de4e0d6 repository lib check ok
Try to get lib-repository
install oceanbase-ce-libs-4.0.0.0 for local ok
Remote oceanbase-ce-libs-4.0.0.0-103000022023011215.el7-ef48cff7633e3dbc39f5c0abdcd72348213e09a2 repository install ok
Remote oceanbase-ce-4.0.0.0-103000022023011215.el7-1d56dc742f5f05a2d15797d291b51a94019e728d repository lib check ok
obtest deployed
Get local repositories ok
Search plugins ok
Open ssh connection ok
Load cluster param plugin ok
Check before start observer ok
[WARN] OBD-1007: (10.3.72.222) The recommended number of max user processes is 12288 (Current value: 4096)
[WARN] (10.3.72.222) clog and data use the same disk (/home)
[WARN] OBD-1007: (10.3.72.150) The recommended number of max user processes is 12288 (Current value: 4096)
[WARN] (10.3.72.150) clog and data use the same disk (/home)
[WARN] OBD-1007: (10.3.72.155) The recommended number of max user processes is 12288 (Current value: 4096)
[WARN] (10.3.72.155) clog and data use the same disk (/home)

Check before start obproxy ok
Check before start obagent ok
Check before start prometheus ok
Check before start grafana ok
Start observer ok
observer program health check ok
Connect to observer ok
Initialize cluster x
[ERROR] Cluster init failed
See https://www.oceanbase.com/product/ob-deployer/error-codes .

实际上observer进程已经启动了。

【复现路径】

【问题现象及影响】

【附件】

你好,麻烦看下topology.yaml文件,根据里面的home_path路径找到home_path/log/observer.log,看一看最后几条日志、搜一下ERROR日志,可以看到报错信息

配置文件和日志我在一起压缩上传了
Desktop.7z (1.6 MB)

memory_limit不能小于system_memory,memory_limit是observer的总内存上限,system_memory是一块公共内存,一般可以设置为memory_limit的1/4左右。
image

1 个赞

memory_limit调到32G,system_memory调到8G了还是不行,机器内存是64G的

obd 有日志,在~/.obd/log/obd 这个文件,会有更多的信息
看目前失败的步骤是在ob做bootstrap的时候,还需要确认下observer的日志中是否有错误,grep ERROR observer.log*

查了一下obd的日志,是因为连接到observer执行sql超时导致,observer.log没发现ERROR级别的日志,但是不停地刷WARN日志,内容和前面上传的一样。

手动登录到observer,发现没有系统库,obd的sql超时应该是和这个有关。

Welcome to the OceanBase.  Commands end with ; or \g.
Your OceanBase connection id is 3221225474
Server version: OceanBase_CE 4.0.0.0 (r103000022023011215-05bbad0279302d7274e1b5ab79323a2c915c1981) (Built Jan 12 2023 15:28:27)

Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

obclient [(none)]> 
obclient [(none)]> 
obclient [(none)]> show databases;
ERROR 1146 (42S02): Table 'oceanbase.__all_database' doesn't exist
obclient [(none)]> 
obclient [(none)]> use oceanbase;
ERROR 1049 (42000): Unknown database
obclient [(none)]> 

是的,应该是bootstrap没成功,系统表都查不到

不要使用autodeploy命令部署,内存参数会被覆盖,可能出现资源申请不到,可以使用deploy命令试试。

用deploy部署也不行,各种yaml模板换了好多个,每次都是相同的问题 :rofl:

看下observer.log的ERROR信息

都是这种

grep ’ ERROR ’ observer.log* ,带上空格可以搜到ERROR日志,上面这些都是WARN

查过了,没有ERROR级别的日志

您好,可以把 obd日志和完整的observer日志发一下吗,上面的压缩包中observerlog看过并不是很完整

文件有点大,我上传到网盘了,麻烦帮忙看看
链接:百度网盘 请输入提取码
提取码:5hp6

补充操作过程:

[root@localhost ob]# obd cluster deploy obtest -c ./topology.yaml
install oceanbase-ce-4.0.0.0 for local ok
install obproxy-ce-4.0.0 for local ok
+--------------------------------------------------------------------------------------------+
|                                          Packages                                          |
+--------------+---------+------------------------+------------------------------------------+
| Repository   | Version | Release                | Md5                                      |
+--------------+---------+------------------------+------------------------------------------+
| oceanbase-ce | 4.0.0.0 | 103000022023011215.el7 | 1d56dc742f5f05a2d15797d291b51a94019e728d |
| obproxy-ce   | 4.0.0   | 5.el7                  | de53232a951184fad75b15884458d85e31d2f6c3 |
+--------------+---------+------------------------+------------------------------------------+
Repository integrity check ok
Parameter check ok
Open ssh connection ok
Cluster status check ok
Initializes observer work home ok
Initializes obproxy work home ok
Remote oceanbase-ce-4.0.0.0-103000022023011215.el7-1d56dc742f5f05a2d15797d291b51a94019e728d repository install ok
Remote oceanbase-ce-4.0.0.0-103000022023011215.el7-1d56dc742f5f05a2d15797d291b51a94019e728d repository lib check !!
Remote obproxy-ce-4.0.0-5.el7-de53232a951184fad75b15884458d85e31d2f6c3 repository install ok
Remote obproxy-ce-4.0.0-5.el7-de53232a951184fad75b15884458d85e31d2f6c3 repository lib check ok
Try to get lib-repository
install oceanbase-ce-libs-4.0.0.0 for local ok
Remote oceanbase-ce-libs-4.0.0.0-103000022023011215.el7-ef48cff7633e3dbc39f5c0abdcd72348213e09a2 repository install ok
Remote oceanbase-ce-4.0.0.0-103000022023011215.el7-1d56dc742f5f05a2d15797d291b51a94019e728d repository lib check ok
obtest deployed
[root@localhost ob]# obd cluster start obtest 
Get local repositories ok
Search plugins ok
Open ssh connection ok
Load cluster param plugin ok
Check before start observer ok
[WARN] OBD-1007: (10.3.72.222) The recommended number of max user processes is 12288 (Current value: 12000)
[WARN] (10.3.72.222) clog and data use the same disk (/home)
[WARN] (10.3.72.150) clog and data use the same disk (/home)
[WARN] (10.3.72.155) clog and data use the same disk (/home)

Check before start obproxy ok
Start observer ok
observer program health check ok
Connect to observer ok
Initialize cluster x
[ERROR] Cluster init failed
See https://www.oceanbase.com/product/ob-deployer/error-codes .
[root@localhost ob]# 

收到,请等待

找到一些ERROR日志,是4122这个错误导致的bootstrap失败,我们继续跟进分析一下

1 个赞

看下各个机器2882端口是否开通有访问权限,防火墙是否关闭了呢

破案了,确实是防火墙问题,其中一台机器防火墙是开启状态 :rofl: #以前在上面部署过其他分布式数据库,以为防火墙没问题,大意了。。。

感谢 @阿绿 @chris-sun @其灵 @donghy @秃蛙