OBD白屏部署,OceanBase DataBase和OCP Express部署失败

【 使用环境 】OpenEuler 24.03 LTS
【 OB or 其他组件 】obd web白屏部署
【 使用版本 】OceanBase Deploy: 2.10.1
OceanBase DataBase 4.3.31
【问题描述】一共3台虚拟机分别想安装observer,在同一网段,130、131、135(3个末段IP)
130 安装了 obd中控,然后再配置的时候,130和135在同一zone,RootServer为130,其他的proxy、OCP Express、和configServer也是130
131单独在一个zone
SSH登录、JAVA验证均没问题,预检查也没有任何问题
可最终是 OceanBase DataBase和OCP Express部署失败

DEBUG信息如下
OceanBase DataBase:

[2024-11-07 16:05:11.929] [DEBUG] - plugin oceanbase-ce-py_script_connect-4.2.2.0 result: True
[2024-11-07 16:05:11.930] [INFO] Initialize oceanbase-ce
[2024-11-07 16:05:11.930] [DEBUG] - Call oceanbase-ce-py_script_bootstrap-4.2.2.0 for oceanbase-ce-4.3.3.1-101000012024102216.el7-5fb8c3d4292f2da6157541bb80a2f1d500dcecdb
[2024-11-07 16:05:11.931] [DEBUG] - import bootstrap
[2024-11-07 16:05:11.938] [DEBUG] - add bootstrap ref count to 1
[2024-11-07 16:05:11.939] [DEBUG] – bootstrap for components: dict_keys([‘oceanbase-ce’, ‘obproxy-ce’, ‘obagent’, ‘ocp-express’, ‘ob-configserver’])
[2024-11-07 16:05:11.939] [DEBUG] – execute sql: set session ob_query_timeout=1000000000
[2024-11-07 16:05:11.939] [DEBUG] – execute sql: set session ob_query_timeout=1000000000. args: None
[2024-11-07 16:05:11.943] [DEBUG] – execute sql: alter system bootstrap REGION “sys_region” ZONE “zone1” SERVER “192.168.18.130:2882”,REGION “sys_region” ZONE “zone2” SERVER “192.168.18.131:2882”. args: None
[2024-11-07 16:06:21.338] [DEBUG] – OBD-5000: alter system bootstrap REGION “sys_region” ZONE “zone1” SERVER “192.168.18.130:2882”,REGION “sys_region” ZONE “zone2” SERVER “192.168.18.131:2882” execute failed
[2024-11-07 16:06:21.340] [DEBUG] – execute sql: alter system add server “192.168.18.135:2882” zone “zone1”. args: None
[2024-11-07 16:06:32.004] [DEBUG] – execute sql: alter user “root” IDENTIFIED BY %s. args: [‘’]
[2024-11-07 16:06:34.175] [DEBUG] – execute sql: select * from oceanbase.__all_server. args: None
[2024-11-07 16:06:59.608] [DEBUG] – root这里是个@192.168.18.130 execute: ls /root/JX_OBD_DEV/oceanbase/.meta
[2024-11-07 16:06:59.794] [DEBUG] – exited code 2, error output:
[2024-11-07 16:06:59.796] [DEBUG] ls: 无法访问 ‘/root/JX_OBD_DEV/oceanbase/.meta’: No such file or directory
[2024-11-07 16:06:59.796] [DEBUG]
[2024-11-07 16:06:59.796] [DEBUG] –
[2024-11-07 16:06:59.797] [DEBUG] – ls: 无法访问 ‘/root/JX_OBD_DEV/oceanbase/.meta’: No such file or directory
[2024-11-07 16:06:59.797] [DEBUG]
[2024-11-07 16:06:59.797] [DEBUG] – root这里是个@192.168.18.130 execute: cat /root/JX_OBD_DEV/oceanbase/run/obshell.pid
[2024-11-07 16:07:00.022] [DEBUG] – exited code 1, error output:
[2024-11-07 16:07:00.023] [DEBUG] cat: /root/JX_OBD_DEV/oceanbase/run/obshell.pid: No such file or directory
[2024-11-07 16:07:00.023] [DEBUG]
[2024-11-07 16:07:00.024] [DEBUG] – root这里是个@192.168.18.130 execute: strings /root/JX_OBD_DEV/oceanbase/etc/observer.conf.bin
[2024-11-07 16:07:00.266] [DEBUG] – exited code 1, error output:
[2024-11-07 16:07:00.267] [DEBUG] strings: /root/JX_OBD_DEV/oceanbase/etc/observer.conf.bin:无此文件
[2024-11-07 16:07:00.267] [DEBUG]
[2024-11-07 16:07:00.267] [DEBUG] –
[2024-11-07 16:07:00.268] [DEBUG] – strings: /root/JX_OBD_DEV/oceanbase/etc/observer.conf.bin:无此文件
[2024-11-07 16:07:00.268] [DEBUG]
[2024-11-07 16:07:00.268] [DEBUG] – root这里是个@192.168.18.131 execute: ls /root/JX_OBD_DEV/oceanbase/.meta
[2024-11-07 16:07:01.008] [DEBUG] – exited code 2, error output:
[2024-11-07 16:07:01.009] [DEBUG] ls: 无法访问 ‘/root/JX_OBD_DEV/oceanbase/.meta’: No such file or directory
[2024-11-07 16:07:01.009] [DEBUG]
[2024-11-07 16:07:01.011] [DEBUG] –
[2024-11-07 16:07:01.011] [DEBUG] – ls: 无法访问 ‘/root/JX_OBD_DEV/oceanbase/.meta’: No such file or directory
[2024-11-07 16:07:01.012] [DEBUG]
[2024-11-07 16:07:01.012] [DEBUG] – root这里是个@192.168.18.131 execute: cat /root/JX_OBD_DEV/oceanbase/run/obshell.pid
[2024-11-07 16:07:01.581] [DEBUG] – exited code 1, error output:
[2024-11-07 16:07:01.582] [DEBUG] cat: /root/JX_OBD_DEV/oceanbase/run/obshell.pid: No such file or directory
[2024-11-07 16:07:01.583] [DEBUG]
[2024-11-07 16:07:01.583] [DEBUG] – root这里是个@192.168.18.131 execute: strings /root/JX_OBD_DEV/oceanbase/etc/observer.conf.bin
[2024-11-07 16:07:02.292] [DEBUG] – exited code 127, error output:
[2024-11-07 16:07:02.293] [DEBUG] bash: 行 1: strings: 未找到命令
[2024-11-07 16:07:02.294] [DEBUG]
[2024-11-07 16:07:02.294] [DEBUG] –
[2024-11-07 16:07:02.294] [DEBUG] – bash: 行 1: strings: 未找到命令
[2024-11-07 16:07:02.294] [DEBUG]
[2024-11-07 16:07:02.295] [DEBUG] – root这里是个@192.168.18.135 execute: ls /root/JX_OBD_DEV/oceanbase/.meta
[2024-11-07 16:07:02.504] [DEBUG] – exited code 2, error output:
[2024-11-07 16:07:02.505] [DEBUG] ls: 无法访问 ‘/root/JX_OBD_DEV/oceanbase/.meta’: No such file or directory
[2024-11-07 16:07:02.505] [DEBUG]
[2024-11-07 16:07:02.506] [DEBUG] –
[2024-11-07 16:07:02.506] [DEBUG] – ls: 无法访问 ‘/root/JX_OBD_DEV/oceanbase/.meta’: No such file or directory
[2024-11-07 16:07:02.506] [DEBUG]
[2024-11-07 16:07:02.507] [DEBUG] – root这里是个@192.168.18.135 execute: cat /root/JX_OBD_DEV/oceanbase/run/obshell.pid
[2024-11-07 16:07:02.746] [DEBUG] – exited code 1, error output:
[2024-11-07 16:07:02.747] [DEBUG] cat: /root/JX_OBD_DEV/oceanbase/run/obshell.pid: No such file or directory
[2024-11-07 16:07:02.747] [DEBUG]
[2024-11-07 16:07:02.747] [DEBUG] – root这里是个@192.168.18.135 execute: strings /root/JX_OBD_DEV/oceanbase/etc/observer.conf.bin
[2024-11-07 16:07:02.968] [DEBUG] – exited code 127, error output:
[2024-11-07 16:07:02.968] [DEBUG] bash: 行 1: strings: 未找到命令
[2024-11-07 16:07:02.969] [DEBUG]
[2024-11-07 16:07:02.969] [DEBUG] –
[2024-11-07 16:07:02.969] [DEBUG] – bash: 行 1: strings: 未找到命令
[2024-11-07 16:07:02.969] [DEBUG]
[2024-11-07 16:07:02.969] [DEBUG] – root这里是个@192.168.18.130 execute: cat /root/JX_OBD_DEV/oceanbase/run/obshell.pid
[2024-11-07 16:07:03.149] [DEBUG] – exited code 1, error output:
[2024-11-07 16:07:03.150] [DEBUG] cat: /root/JX_OBD_DEV/oceanbase/run/obshell.pid: No such file or directory
[2024-11-07 16:07:03.151] [DEBUG]
[2024-11-07 16:07:03.151] [DEBUG] – root这里是个@192.168.18.130 export OB_ROOT_PASSWORD=’'
’’
[2024-11-07 16:07:03.152] [DEBUG] – start obshell: cd /root/JX_OBD_DEV/oceanbase; /root/JX_OBD_DEV/oceanbase/bin/obshell admin start --ip 192.168.18.130 --port 2886
[2024-11-07 16:07:03.152] [DEBUG] – root这里是个@192.168.18.130 execute: cd /root/JX_OBD_DEV/oceanbase; /root/JX_OBD_DEV/oceanbase/bin/obshell admin start --ip 192.168.18.130 --port 2886
[2024-11-07 16:07:15.433] [DEBUG] – exited code 30, error output:
[2024-11-07 16:07:15.434] [DEBUG] e[31m[FAILED]e[0m take over or rebuild failed: Error 4638 (HY000): The RootServer is not the master
[2024-11-07 16:07:15.435] [DEBUG] obshell server exited with code 22, please check obshell.log for more details
[2024-11-07 16:07:15.435] [DEBUG]
[2024-11-07 16:07:15.436] [ERROR] 192.168.18.130 obshell failed
[2024-11-07 16:07:15.436] [DEBUG] - sub bootstrap ref count to 0
[2024-11-07 16:07:15.436] [DEBUG] - export bootstrap
[2024-11-07 16:07:15.436] [DEBUG] - plugin oceanbase-ce-py_script_bootstrap-4.2.2.0 result: None
[2024-11-07 16:07:15.437] [INFO] [ERROR] 192.168.18.130 obshell failed
[2024-11-07 16:07:15.437] [INFO]
[2024-11-07 16:07:15.437] [ERROR] Cluster init failed

OCP Express:

[2024-11-07 16:07:49.664] [DEBUG] – get disk info for path /, total: 1232232148992 avail: 1131545735168
[2024-11-07 16:07:49.665] [DEBUG] - plugin ocp-express-py_script_start_check-4.2.2 result: True
[2024-11-07 16:07:49.665] [DEBUG] - Call ocp-express-py_script_start-4.2.2 for ocp-express-4.2.2-100000022024011120.el7-09ffcf156d1df9318a78af52656f499d2315e3f7
[2024-11-07 16:07:49.666] [DEBUG] - import start
[2024-11-07 16:07:49.681] [DEBUG] - add start ref count to 1
[2024-11-07 16:07:49.714] [INFO] Start ocp-express

[2024-11-07 16:07:49.714] [DEBUG] – root这里是个@192.168.18.130 execute: cat

/root/JX_OBD_DEV/ocpexpress/run/ocp-express.pid
[2024-11-07 16:07:49.892] [DEBUG] – exited code 1, error output:
[2024-11-07 16:07:49.893] [DEBUG] cat: /root/JX_OBD_DEV/ocpexpress/run/ocp-express.pid: No such file or directory
[2024-11-07 16:07:49.893] [DEBUG]
[2024-11-07 16:07:49.894] [DEBUG] – root这里是个@192.168.18.130 execute: ls /root/JX_OBD_DEV/ocpexpress/.bootstrapped
[2024-11-07 16:07:50.123] [DEBUG] – exited code 2, error output:
[2024-11-07 16:07:50.124] [DEBUG] ls: 无法访问 ‘/root/JX_OBD_DEV/ocpexpress/.bootstrapped’: No such file or directory
[2024-11-07 16:07:50.124] [DEBUG]
[2024-11-07 16:07:50.125] [DEBUG] – connect 192.168.18.130 -P2883 -umeta@ocp_meta -p******
[2024-11-07 16:07:51.476] [DEBUG] – connect 192.168.18.130 -P2883 -umeta@ocp -p******
…………这部分日志一直重复打印很多
…………这部分日志一直重复打印很多
…………这部分日志一直重复打印很多
[2024-11-07 16:17:41.767] [DEBUG] – connect 192.168.18.130 -P2883 -umeta@ocp -p******
[2024-11-07 16:17:43.766] [DEBUG] – connect 192.168.18.130 -P2883 -umeta@ocp_meta -p******
[2024-11-07 16:17:45.769] [DEBUG] – connect 192.168.18.130 -P2883 -umeta@ocp -p******
[2024-11-07 16:17:47.768] [DEBUG] – connect 192.168.18.130 -P2883 -umeta@ocp_meta -p******
[2024-11-07 16:17:49.768] [DEBUG] – connect 192.168.18.130 -P2883 -umeta@ocp -p******
[2024-11-07 16:17:51.766] [DEBUG] – connect 192.168.18.130 -P2883 -umeta@ocp_meta -p******
[2024-11-07 16:17:53.770] [DEBUG] – connect 192.168.18.130 -P2883 -umeta@ocp -p******
[2024-11-07 16:17:55.766] [ERROR] 192.168.18.130: failed to connect meta db
[2024-11-07 16:17:55.767] [INFO] [ERROR] 192.168.18.130: failed to connect meta db
[2024-11-07 16:17:55.767] [INFO]
[2024-11-07 16:17:55.767] [DEBUG] - sub start ref count to 0
[2024-11-07 16:17:55.768] [DEBUG] - export start
[2024-11-07 16:17:55.768] [DEBUG] - plugin ocp-express-py_script_start-4.2.2 result: False
[2024-11-07 16:17:55.768] [ERROR] ocp-express start failed

image
这段意思是链接oceanbase数据库。 但是呢2883一直无法连接上。才会报这个错误?

1.查看这个租户ocp是否能连接上,可以手工2881 2883在报这个错误时候连接
2. 查看资源是否够用 cpu 内存 io等是否资源紧张

提供一份yaml文件,obshell日志
observer的log目录里有个log_obshell目录,里面的就是 obshell 的日志

麻烦问一下,yaml文件在哪里? 刚开始尝试部署,所以对结构不太了解。
client.log (2.2 KB)
daemon.log (5.0 KB)
obshell.log (20.2 KB)
obshell.out.log (26.7 KB)

在 ~/.obd/cluster/xxxx/

yaml.rar (1.3 KB)

报错显示ob中存在未知数据库ocs导致obshell failed
目前ocp-express是单集群的轻量化ocp,建议你使用其中一台节点部署ocp后续通过ocp进行操作部署集群

我就是一个节点,130部署的OCP呀,当时部署的规划就是这样的。

memory_limit: 6G
system_memory: 2G

memory_limit> ocp_meta+sys+system_memory
报错大概率是因为内存给的太少了,配置文件中未看的ocp_meta租户的内存配置,这些在白屏化部署时候可以设置的。

130虚拟机的配置是6核24G(内存),应该足够了
131和135是4核8G(内存)
当时安装过程中,我看到了2个警告提示,有一个是131和135提示内存不太够,空闲5G,需要6G
另外一个是 clog and data same disk(/)
我看到这俩都是警告,所以就没在意。如果能部署成功,我打算后续再扩容内存的。
是内存不足导致的吗? 那我调整一下

另外再问您一下,您说的内存配置这块,我从obd web安装界面中确实看到了有地方可配置,之前我都是没有自定义去配置,都走默认的选项。

如果我把虚拟机内存扩容一下,也需要自定义去配置吗?
另外问您一下,现在我是 OceanBase DataBase 和 OCP Express 失败了。
OBProxy、OBAgent、obconfigserver成功了。
然后我如果我要重新装的话,是不是没办法在这些成功的基础上继续安装失败的了
只能更换集群名称,或是把这集群的数据都删了之后,重新以这个集群名命名之后重新装吗?

  1. 既然安装失败了,那就重新安装吧,并多次尝试安装几次,收货颇丰
  2. obproxy obagent obconfigserver这些组件 如果内存较小的话,我记的可以不安装,后期安装即可
  3. 根据配置是oceanbase的配置太低无法支持创建对应的ocp需要的租户导致的,比如多次连接ocp租户无法成功
  4. obd 提供很多命令可以删除掉源集群进行安装 帮助命令obd --help进行查看

#######################################
obd cluster list --查看集群名称
obd cluster --help
obd cluster destroy cluseter_name --摧毁集群

先谢,昨天我扩容了内存配置,除了obd中控那台,另外两天都已经扩容到了12G的内存。然后安装之后又出现了新的问题。依然导致OceanBase DataBase 和 OCP Express 失败了。

@辞霜 这是扩容内存,且分配了 memory_limit: 8GB、system_memory: 3GB、ocp_meta_tenant_memory_size: 4G
目前又出现了新的问题:

[2024-11-07 18:55:31.266] [DEBUG] – root@192.168.18.130 execute: ls /root/jx_obd_dev/oceanbase/store/clog/tenant_1/
[2024-11-07 18:55:31.456] [DEBUG] – exited code 2, error output:
[2024-11-07 18:55:31.457] [DEBUG] ls: 无法访问 ‘/root/jx_obd_dev/oceanbase/store/clog/tenant_1/’: No such file or directory
[2024-11-07 18:55:31.457] [DEBUG]
[2024-11-07 18:55:31.458] [DEBUG] – root@192.168.18.130 execute: cat /root/jx_obd_dev/oceanbase/run/observer.pid
[2024-11-07 18:55:31.681] [DEBUG] – exited code 1, error output:
[2024-11-07 18:55:31.681] [DEBUG] cat: /root/jx_obd_dev/oceanbase/run/observer.pid: No such file or directory
[2024-11-07 18:55:31.682] [DEBUG]
[2024-11-07 18:55:31.682] [DEBUG] – 192.168.18.130 start command construction
[2024-11-07 18:55:31.682] [DEBUG] – update large_query_threshold to 600s because of scenario
[2024-11-07 18:55:31.682] [DEBUG] – update enable_record_trace_log to False because of scenario
[2024-11-07 18:55:31.683] [DEBUG] – update enable_syslog_recycle to 1 because of scenario
[2024-11-07 18:55:31.683] [DEBUG] – update max_syslog_file_count to 300 because of scenario
[2024-11-07 18:55:31.683] [DEBUG] – root@192.168.18.131 execute: ls /root/jx_obd_dev/oceanbase/store/clog/tenant_1/
[2024-11-07 18:55:31.842] [DEBUG] – exited code 2, error output:
[2024-11-07 18:55:31.843] [DEBUG] ls: 无法访问 ‘/root/jx_obd_dev/oceanbase/store/clog/tenant_1/’: No such file or directory
[2024-11-07 18:55:31.843] [DEBUG]
[2024-11-07 18:55:31.843] [DEBUG] – root@192.168.18.131 execute: cat /root/jx_obd_dev/oceanbase/run/observer.pid
[2024-11-07 18:55:32.034] [DEBUG] – exited code 1, error output:
[2024-11-07 18:55:32.035] [DEBUG] cat: /root/jx_obd_dev/oceanbase/run/observer.pid: No such file or directory

以下是yaml、log_obshell日志和安装过程中的DEBUG日志
log_obshell.rar (249.8 KB)
OceanBase DataBase失败日志.txt (66.4 KB)
OCP Express失败日志.txt (41.8 KB)
yaml.rar (1.2 KB)
烦请帮忙看看!

如果是资源问题的话,下面是我昨天晚上部署时的配置:
ocp 一台服务器192.168.10.100【 8c 16G】
observer三台【201,202,203】【4c 16g】
system_memory=2g
memory_limit=0M
log1盘大于物理内存的4倍,约30~50G【这个我分了20G,安装时有告警但是也过了】
data/1 分了110G
ocp_meta租户 【2c2g】
ocp_monitor租户【2c2g】
ocp sys租户 【3c2g】
–我的是虚拟机资源有限,供参考

obd日志显示创建租户错误,还是内存设置问题,
system_memory=2g
memory_limit=尽量给大点
ocp_meta_tenant_memory_size=2G

先谢,您这边是使用OCP的安装部署方式是吧。我这边是用obd的部署向导。感谢您提供思路,我也试一下!

嗯,我是用了all-in-one先安装了obd 然后用obd部署ocp 再用ocp部署observer集群

1 个赞

你好,目前问题解决部署成功了么

更换了OCP方式部署成功了。obd的没再继续试

1 个赞