命令“obd cluster start dh_ob -c oceanbase-ce -s IP” 的bug

【 使用环境 】测试环境
【 OB or 其他组件 】ob Server和obd
【 使用版本 】社区版 4.2.2
【问题描述】已部署一个zone的3个ob Server节点,集群正常。想刻意模拟演练2节点工作,集群是否正常,使用 obd cluster stop dh_ob -c oceanbase-ce -s 10.1.4.180 正常
使用obd命令启动改节点的ob Server异常
obd cluster start dh_ob -c oceanbase-ce -s 10.1.4.180 异常

异常日志如下所示
如下执行,正常,这个估计是obd的bug
obd cluster restart dh_ob -c oceanbase-ce -s 10.1.4.180 异常

【复现路径】问题出现前后相关操作
【附件及日志】

[2024-12-19 18:06:18.184] [WARNING] OBD-1007: (10.1.4.180) The recommended number of max user processes is 655350 (Current value: 131072)
[2024-12-19 18:06:18.184] [WARNING] OBD-1007: (10.1.4.180) The recommended number of stack size is unlimited (Current value: 8192)
[2024-12-19 18:06:18.184] [DEBUG] -- sudoroot@10.1.4.180 execute: sysctl -a 
[2024-12-19 18:06:18.260] [DEBUG] -- exited code 0
[2024-12-19 18:06:18.264] [DEBUG] -- sudoroot@10.1.4.180 execute: cat /proc/meminfo 
[2024-12-19 18:06:18.322] [DEBUG] -- exited code 0
[2024-12-19 18:06:18.323] [DEBUG] -- sudoroot@10.1.4.180 execute: df --block-size=1024  
[2024-12-19 18:06:18.383] [DEBUG] -- exited code 0
[2024-12-19 18:06:18.384] [DEBUG] -- get disk info for path /dev, total: 33472049152 avail: 33472049152
[2024-12-19 18:06:18.385] [DEBUG] -- get disk info for path /dev/shm, total: 33489457152 avail: 33489457152
[2024-12-19 18:06:18.385] [DEBUG] -- get disk info for path /run, total: 33489457152 avail: 30224388096
[2024-12-19 18:06:18.385] [DEBUG] -- get disk info for path /sys/fs/cgroup, total: 33489457152 avail: 33489457152
[2024-12-19 18:06:18.385] [DEBUG] -- get disk info for path /, total: 46670925824 avail: 40094486528
[2024-12-19 18:06:18.385] [DEBUG] -- get disk info for path /tmp, total: 33489457152 avail: 33489424384
[2024-12-19 18:06:18.385] [DEBUG] -- get disk info for path /boot, total: 1020702720 avail: 808050688
[2024-12-19 18:06:18.385] [DEBUG] -- get disk info for path /boot/efi, total: 627900416 avail: 619917312
[2024-12-19 18:06:18.385] [DEBUG] -- get disk info for path /opt, total: 482943700992 avail: 7224631296
[2024-12-19 18:06:18.385] [DEBUG] -- get disk info for path /run/user/1003, total: 6697889792 avail: 6697889792
[2024-12-19 18:06:18.385] [DEBUG] -- get disk info for path /run/user/1000, total: 6697889792 avail: 6697889792
[2024-12-19 18:06:18.385] [DEBUG] -- disk: {'/dev': {'total': 33472049152, 'avail': 33472049152, 'need': 0}, '/dev/shm': {'total': 33489457152, 'avail': 33489457152, 'need': 0}, '/run': {'total': 33489457152, 'avail': 30224388096, 'need': 0}, '/sys/fs/cgroup': {'total': 33489457152, 'avail': 33489457152, 'need': 0}, '/': {'total': 46670925824, 'avail': 40094486528, 'need': 0}, '/tmp': {'total': 33489457152, 'avail': 33489424384, 'need': 0}, '/boot': {'total': 1020702720, 'avail': 808050688, 'need': 0}, '/boot/efi': {'total': 627900416, 'avail': 619917312, 'need': 0}, '/opt': {'total': 482943700992, 'avail': 7224631296, 'need': 0}, '/run/user/1003': {'total': 6697889792, 'avail': 6697889792, 'need': 0}, '/run/user/1000': {'total': 6697889792, 'avail': 6697889792, 'need': 0}}
[2024-12-19 18:06:18.388] [ERROR] oceanbase-ce-py_script_start_check-4.2.2.0 RuntimeError: <10.1.4.180>
[2024-12-19 18:06:18.388] [ERROR] Traceback (most recent call last):
[2024-12-19 18:06:18.388] [ERROR]   File "core.py", line 2020, in start_cluster
[2024-12-19 18:06:18.388] [ERROR]   File "core.py", line 2098, in _start_cluster
[2024-12-19 18:06:18.388] [ERROR]   File "core.py", line 188, in call_plugin
[2024-12-19 18:06:18.388] [ERROR]   File "_plugin.py", line 347, in __call__
[2024-12-19 18:06:18.388] [ERROR]   File "_plugin.py", line 305, in _new_func
[2024-12-19 18:06:18.388] [ERROR]   File "/home/sudoroot/.obd/plugins/oceanbase-ce/4.2.2.0/start_check.py", line 744, in start_check
[2024-12-19 18:06:18.388] [ERROR]     log_disk_size = servers_log_disk_size[server]
[2024-12-19 18:06:18.388] [ERROR] KeyError: <10.1.4.180>
[2024-12-19 18:06:18.388] [ERROR] 
[2024-12-19 18:06:18.388] [DEBUG] - plugin oceanbase-ce-py_script_start_check-4.2.2.0 restore servers: [<10.1.4.178>, <10.1.4.179>, <10.1.4.180>]
[2024-12-19 18:06:18.388] [DEBUG] - sub start_check ref count to 0
[2024-12-19 18:06:18.388] [DEBUG] - export start_check
[2024-12-19 18:06:18.388] [DEBUG] - oceanbase-ce starting check failed.
[2024-12-19 18:06:18.391] [INFO] See https://www.oceanbase.com/product/ob-deployer/error-codes .
[2024-12-19 18:06:18.391] [INFO] Trace ID: e1e13d28-bdf0-11ef-963a-566f351b0096
[2024-12-19 18:06:18.392] [INFO] If you want to view detailed obd logs, please run: obd display-trace e1e13d28-bdf0-11ef-963a-566f351b0096
[2024-12-19 18:06:18.424] [INFO] [WARN] OBD-1007: (10.1.4.180) The recommended number of max user processes is 655350 (Current value: 131072)
[2024-12-19 18:06:18.425] [INFO] [WARN] OBD-1007: (10.1.4.180) The recommended number of stack size is unlimited (Current value: 8192)

obd是哪个版本?

[sudoroot@localhost ~]$ obd --version
OceanBase Deploy: 2.6.1
REVISION: 6aad22bedf20b041b23ff58c203d94dc165c717a
BUILD_BRANCH: HEAD
BUILD_TIME: Feb 05 2024 17:12:05OURCE
Copyright (C) 2021 OceanBase
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
[sudoroot@localhost ~]$ 

ALL IN ONE的包
oceanbase-all-in-one-4.2.2.0-100010012024022719.el8.x86_64.tar.gz

麻烦上传下完整的obd.log

oceanbase-ce.txt (26.0 KB)

好的,我反馈下这个问题,感谢你的反馈

另外用obd启动后,ob Server的节点ps看不到完整的命令行。
一般obServer启动 可看到大量的参数,现在只有-p 2881 :joy:

Last login: Thu Dec 19 18:20:27 2024 from 10.1.172.251
[sudoroot@localhost ~]$ ps -ef|grep ob
sudoroot  489314       1 69 10月08 ?      50-21:30:28 /opt/cluster/dh_ob/oceanbase/bin/observer -p 2881
sudoroot  489991       1  0 10月08 ?      00:16:42 /opt/cluster/dh_ob/oceanbase/bin/obshell daemon --ip 10.1.4.179 --port 2886
sudoroot  490021  489991  0 10月08 ?      01:55:51 /opt/cluster/dh_ob/oceanbase/bin/obshell server --ip 10.1.4.179 --port 2886
sudoroot  491222       1  0 10月08 ?      00:21:27 /opt/cluster/dh_ob/obagent/bin/ob_agentd -c /opt/cluster/dh_ob/obagent/conf/agentd.yaml
sudoroot  491228  491222  0 10月08 ?      00:02:06 /opt/cluster/dh_ob/obagent/bin/ob_mgragent
sudoroot  491229  491222  0 10月08 ?      03:36:19 /opt/cluster/dh_ob/obagent/bin/ob_monagent
sudoroot  765680  765603  0 10:36 pts/0    00:00:00 grep --color=auto ob

原来想从其他主机的ps的拷贝命令行,在启动失败的ob Server,本地执行拉起,可惜不行
/opt/cluster/dh_ob/oceanbase/bin/observer -p 2881

obd cluster start dh_ob -c oceanbase-ce
直接执行这个命令就行 会拉起未拉起的节点,已经拉起的会跳过

我刚用centos 7 的系统,用了你的包 没有复现 你说的问题。 怀疑是你的环境可能有问题。可以更新一下插件

rm -f ~/.obd/version
obd cluster list