使用命令行部署OceanBase集群出错

【 使用环境 】测试环境
【 OB or 其他组件 】
【 使用版本 】oceanbase-all-in-one-4.2.1.8-108000022024072217.el8.x86_64.tar.gz
【问题描述】出现RuntimeError错误
【复现路径】按照官方文档执行obd cluster deploy obcluster -c /home/admin/default-components.yaml部署集群已完成,接着执行obd cluster start obcluster启动集群的时候出错
【附件及日志】

  1. 执行命令后信息为:
    Get local repositories ok
    Search plugins ok
    Load cluster param plugin ok
    Open ssh connection ok
    Check before start observer x
    [WARN] OBD-1012: (192.168.128.36) clog and data use the same disk (/)
    [WARN] OBD-1012: (192.168.128.37) clog and data use the same disk (/)
    [WARN] OBD-1012: (192.168.128.38) clog and data use the same disk (/)
    [ERROR] oceanbase-ce-py_script_start_check-4.2.1.4 RuntimeError: module ‘_errno’ has no attribute ‘EC_OCP_EXPRESS_META_DB_NOT_ENOUGH_LOG_DISK’
    See https://www.oceanbase.com/product/ob-deployer/error-codes .
    Trace ID: 84ada3c6-7b1a-11ef-9aed-4ac097000062
    If you want to view detailed obd logs, please run: obd display-trace 84ada3c6-7b1a-11ef-9aed-4ac097000062

  2. 按照上面的提示查看对应的日志文件,相关信息为:
    [2024-09-25 17:33:12.555] [WARNING] OBD-1012: (192.168.128.37) clog and data use the same disk (/)
    [2024-09-25 17:33:12.555] [DEBUG] – root@192.168.128.38 execute: cat /proc/sys/fs/aio-max-nr /proc/sys/fs/aio-nr
    [2024-09-25 17:33:12.607] [DEBUG] – exited code 0
    [2024-09-25 17:33:12.607] [DEBUG] – root@192.168.128.38 execute: ulimit -a
    [2024-09-25 17:33:12.697] [DEBUG] – exited code 0
    [2024-09-25 17:33:12.697] [DEBUG] – root@192.168.128.38 execute: sysctl -a
    [2024-09-25 17:33:12.806] [DEBUG] – exited code 0
    [2024-09-25 17:33:12.809] [DEBUG] – root@192.168.128.38 execute: cat /proc/meminfo
    [2024-09-25 17:33:12.898] [DEBUG] – exited code 0
    [2024-09-25 17:33:12.898] [DEBUG] – root@192.168.128.38 execute: df --block-size=1024
    [2024-09-25 17:33:12.991] [DEBUG] – exited code 0
    [2024-09-25 17:33:12.991] [DEBUG] – get disk info for path /dev, total: 12492087296 avail: 12492087296
    [2024-09-25 17:33:12.992] [DEBUG] – get disk info for path /dev/shm, total: 12598837248 avail: 12598751232
    [2024-09-25 17:33:12.992] [DEBUG] – get disk info for path /run, total: 12598837248 avail: 12538691584
    [2024-09-25 17:33:12.992] [DEBUG] – get disk info for path /sys/fs/cgroup, total: 12598837248 avail: 12598837248
    [2024-09-25 17:33:12.992] [DEBUG] – get disk info for path /sys/firmware/efi/efivars, total: 57344 avail: 29696
    [2024-09-25 17:33:12.992] [DEBUG] – get disk info for path /, total: 108360892416 avail: 75995529216
    [2024-09-25 17:33:12.992] [DEBUG] – get disk info for path /boot, total: 531267584 avail: 260554752
    [2024-09-25 17:33:12.992] [DEBUG] – get disk info for path /boot/efi, total: 66959360 avail: 60913664
    [2024-09-25 17:33:12.992] [DEBUG] – get disk info for path /var/lib/ceph/osd/ceph-2, total: 12598837248 avail: 12598767616
    [2024-09-25 17:33:12.992] [DEBUG] – get disk info for path /var/lib/ceph/osd/ceph-5, total: 12598837248 avail: 12598767616
    [2024-09-25 17:33:12.992] [DEBUG] – get disk info for path /mnt/cloud_storage, total: 3127126261760 avail: 3127126261760
    [2024-09-25 17:33:12.992] [DEBUG] – get disk info for path /run/user/0, total: 2519764992 avail: 2519760896
    [2024-09-25 17:33:12.992] [DEBUG] – root@192.168.128.38 execute: df --block-size=1024 /home/admin/obcluster/observer/store/clog
    [2024-09-25 17:33:13.082] [DEBUG] – exited code 0
    [2024-09-25 17:33:13.083] [DEBUG] – get disk info for path /, total: 108360892416 avail: 75995529216
    [2024-09-25 17:33:13.083] [DEBUG] – root@192.168.128.38 execute: df --block-size=1024 /home/admin/obcluster/observer/store
    [2024-09-25 17:33:13.173] [DEBUG] – exited code 0
    [2024-09-25 17:33:13.173] [DEBUG] – get disk info for path /, total: 108360892416 avail: 75995529216
    [2024-09-25 17:33:13.173] [DEBUG] – disk: {’/dev’: {‘total’: 12492087296, ‘avail’: 12492087296, ‘need’: 0}, ‘/dev/shm’: {‘total’: 12598837248, ‘avail’: 12598751232, ‘need’: 0}, ‘/run’: {‘total’: 12598837248, ‘avail’: 12538691584, ‘need’: 0}, ‘/sys/fs/cgroup’: {‘total’: 12598837248, ‘avail’: 12598837248, ‘need’: 0}, ‘/sys/firmware/efi/efivars’: {‘total’: 57344, ‘avail’: 29696, ‘need’: 0}, ‘/’: {‘total’: 108360892416, ‘avail’: 75995529216, ‘need’: 0}, ‘/boot’: {‘total’: 531267584, ‘avail’: 260554752, ‘need’: 0}, ‘/boot/efi’: {‘total’: 66959360, ‘avail’: 60913664, ‘need’: 0}, ‘/var/lib/ceph/osd/ceph-2’: {‘total’: 12598837248, ‘avail’: 12598767616, ‘need’: 0}, ‘/var/lib/ceph/osd/ceph-5’: {‘total’: 12598837248, ‘avail’: 12598767616, ‘need’: 0}, ‘/mnt/cloud_storage’: {‘total’: 3127126261760, ‘avail’: 3127126261760, ‘need’: 0}, ‘/run/user/0’: {‘total’: 2519764992, ‘avail’: 2519760896, ‘need’: 0}}
    [2024-09-25 17:33:13.174] [WARNING] OBD-1012: (192.168.128.38) clog and data use the same disk (/)
    [2024-09-25 17:33:13.175] [ERROR] oceanbase-ce-py_script_start_check-4.2.1.4 RuntimeError: module ‘_errno’ has no attribute ‘EC_OCP_EXPRESS_META_DB_NOT_ENOUGH_LOG_DISK’
    [2024-09-25 17:33:13.175] [ERROR] Traceback (most recent call last):
    [2024-09-25 17:33:13.175] [ERROR] File “core.py”, line 2065, in start_cluster
    [2024-09-25 17:33:13.176] [ERROR] File “core.py”, line 2144, in _start_cluster
    [2024-09-25 17:33:13.176] [ERROR] File “core.py”, line 197, in call_plugin
    [2024-09-25 17:33:13.176] [ERROR] File “_plugin.py”, line 348, in call
    [2024-09-25 17:33:13.176] [ERROR] File “_plugin.py”, line 305, in _new_func
    [2024-09-25 17:33:13.176] [ERROR] File “/root/.obd/plugins/oceanbase-ce/4.2.1.4/start_check.py”, line 731, in start_check
    [2024-09-25 17:33:13.176] [ERROR] error(‘ocp meta db’, err.EC_OCP_EXPRESS_META_DB_NOT_ENOUGH_LOG_DISK.format(), [suggest])
    [2024-09-25 17:33:13.176] [ERROR] AttributeError: module ‘_errno’ has no attribute ‘EC_OCP_EXPRESS_META_DB_NOT_ENOUGH_LOG_DISK’
    [2024-09-25 17:33:13.176] [ERROR]
    [2024-09-25 17:33:13.176] [DEBUG] - sub start_check ref count to 0
    [2024-09-25 17:33:13.176] [DEBUG] - export start_check
    [2024-09-25 17:33:13.176] [DEBUG] - plugin oceanbase-ce-py_script_start_check-4.2.1.4 result: False
    [2024-09-25 17:33:13.176] [DEBUG] - oceanbase-ce starting check failed.
    [2024-09-25 17:33:13.176] [DEBUG] - Call obproxy-ce-py_script_start_check-4.2.3 for obproxy-ce-4.2.3.0-3.el8-2526073e3c652177b15093be611af94a469e0e21
    [2024-09-25 17:33:13.176] [DEBUG] - import start_check
    [2024-09-25 17:33:13.177] [DEBUG] - add start_check ref count to 1
    [2024-09-25 17:33:13.178] [DEBUG] – root@192.168.128.36 execute: cat /home/admin/obcluster/obproxy/run/obproxy-192.168.128.36-2883.pid
    [2024-09-25 17:33:13.231] [DEBUG] – exited code 1, error output:
    [2024-09-25 17:33:13.231] [DEBUG] cat: /home/admin/obcluster/obproxy/run/obproxy-192.168.128.36-2883.pid: No such file or directory
    [2024-09-25 17:33:13.231] [DEBUG]
    [2024-09-25 17:33:13.231] [DEBUG] – 192.168.128.36 port check
    [2024-09-25 17:33:13.231] [DEBUG] – root@192.168.128.36 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{if($4==“0A”) print $2,$4,$10}’ | grep ‘:0B43’ | awk -F’ ’ ‘{print $3}’ | uniq
    [2024-09-25 17:33:13.326] [DEBUG] – exited code 0
    [2024-09-25 17:33:13.326] [DEBUG] – root@192.168.128.36 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{if($4==“0A”) print $2,$4,$10}’ | grep ‘:0B44’ | awk -F’ ’ ‘{print $3}’ | uniq
    [2024-09-25 17:33:13.427] [DEBUG] – exited code 0
    [2024-09-25 17:33:13.427] [DEBUG] - sub start_check ref count to 0
    [2024-09-25 17:33:13.427] [DEBUG] - export start_check
    [2024-09-25 17:33:13.427] [DEBUG] - plugin obproxy-ce-py_script_start_check-4.2.3 result: True
    [2024-09-25 17:33:13.427] [DEBUG] - Call obagent-py_script_start_check-1.3.0 for obagent-4.2.2-100000042024011120.el8-bf152b880953c2043ddaf80d6180cf22bb8c8ac2
    [2024-09-25 17:33:13.427] [DEBUG] - import start_check
    [2024-09-25 17:33:13.429] [DEBUG] - add start_check ref count to 1
    [2024-09-25 17:33:13.431] [DEBUG] – root@192.168.128.36 execute: cat /home/admin/obcluster/obagent/run/ob_agentd.pid
    [2024-09-25 17:33:13.492] [DEBUG] – exited code 1, error output:
    [2024-09-25 17:33:13.493] [DEBUG] cat: /home/admin/obcluster/obagent/run/ob_agentd.pid: No such file or directory
    [2024-09-25 17:33:13.493] [DEBUG]
    [2024-09-25 17:33:13.493] [DEBUG] – server1(192.168.128.36) port check
    [2024-09-25 17:33:13.493] [DEBUG] – root@192.168.128.36 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{if($4==“0A”) print $2,$4,$10}’ | grep ‘:1F99’ | awk -F’ ’ ‘{print $3}’ | uniq
    [2024-09-25 17:33:13.589] [DEBUG] – exited code 0
    [2024-09-25 17:33:13.590] [DEBUG] – root@192.168.128.36 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{if($4==“0A”) print $2,$4,$10}’ | grep ‘:1F98’ | awk -F’ ’ ‘{print $3}’ | uniq
    [2024-09-25 17:33:13.685] [DEBUG] – exited code 0
    [2024-09-25 17:33:13.685] [DEBUG] – root@192.168.128.37 execute: cat /home/admin/obcluster/obagent/run/ob_agentd.pid
    [2024-09-25 17:33:13.738] [DEBUG] – exited code 1, error output:
    [2024-09-25 17:33:13.738] [DEBUG] cat: /home/admin/obcluster/obagent/run/ob_agentd.pid: No such file or directory
    [2024-09-25 17:33:13.738] [DEBUG]
    [2024-09-25 17:33:13.738] [DEBUG] – server2(192.168.128.37) port check
    [2024-09-25 17:33:13.738] [DEBUG] – root@192.168.128.37 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{if($4==“0A”) print $2,$4,$10}’ | grep ‘:1F99’ | awk -F’ ’ ‘{print $3}’ | uniq
    [2024-09-25 17:33:13.835] [DEBUG] – exited code 0
    [2024-09-25 17:33:13.835] [DEBUG] – root@192.168.128.37 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{if($4==“0A”) print $2,$4,$10}’ | grep ‘:1F98’ | awk -F’ ’ ‘{print $3}’ | uniq
    [2024-09-25 17:33:13.932] [DEBUG] – exited code 0
    [2024-09-25 17:33:13.932] [DEBUG] – root@192.168.128.38 execute: cat /home/admin/obcluster/obagent/run/ob_agentd.pid
    [2024-09-25 17:33:13.982] [DEBUG] – exited code 1, error output:
    [2024-09-25 17:33:13.982] [DEBUG] cat: /home/admin/obcluster/obagent/run/ob_agentd.pid: No such file or directory
    [2024-09-25 17:33:13.983] [DEBUG]
    [2024-09-25 17:33:13.983] [DEBUG] – server3(192.168.128.38) port check
    [2024-09-25 17:33:13.983] [DEBUG] – root@192.168.128.38 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{if($4==“0A”) print $2,$4,$10}’ | grep ‘:1F99’ | awk -F’ ’ ‘{print $3}’ | uniq
    [2024-09-25 17:33:14.078] [DEBUG] – exited code 0
    [2024-09-25 17:33:14.079] [DEBUG] – root@192.168.128.38 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{if($4==“0A”) print $2,$4,$10}’ | grep ‘:1F98’ | awk -F’ ’ ‘{print $3}’ | uniq
    [2024-09-25 17:33:14.176] [DEBUG] – exited code 0
    [2024-09-25 17:33:14.176] [DEBUG] - sub start_check ref count to 0
    [2024-09-25 17:33:14.176] [DEBUG] - export start_check
    [2024-09-25 17:33:14.176] [DEBUG] - plugin obagent-py_script_start_check-1.3.0 result: True
    [2024-09-25 17:33:14.177] [DEBUG] - Call ocp-express-py_script_start_check-4.2.2 for ocp-express-4.2.2-100000022024011120.el8-e5c152ebdd65839ed5f5521ff6c73e6a29cb9e75
    [2024-09-25 17:33:14.177] [DEBUG] - import start_check
    [2024-09-25 17:33:14.179] [DEBUG] - add start_check ref count to 1
    [2024-09-25 17:33:14.182] [DEBUG] – root@192.168.128.36 execute: cat /home/admin/obcluster/ocp-express/run/ocp-express.pid
    [2024-09-25 17:33:14.232] [DEBUG] – exited code 1, error output:
    [2024-09-25 17:33:14.233] [DEBUG] cat: /home/admin/obcluster/ocp-express/run/ocp-express.pid: No such file or directory
    [2024-09-25 17:33:14.233] [DEBUG]
    [2024-09-25 17:33:14.233] [DEBUG] – root@192.168.128.36 execute: bash -c ‘cat /proc/net/{udp*,tcp*}’ | awk -F’ ’ ‘{if($4==“0A”) print $2,$4,$10}’ | grep ‘:1FF4’ | awk -F’ ’ ‘{print $3}’ | uniq
    [2024-09-25 17:33:14.328] [DEBUG] – exited code 0
    [2024-09-25 17:33:14.329] [DEBUG] – root@192.168.128.36 append ‘/home/admin/obcluster/ocp-express/jre/bin:’ to PATH
    [2024-09-25 17:33:14.329] [DEBUG] – root@192.168.128.36 execute: java -version
    [2024-09-25 17:33:14.452] [DEBUG] – exited code 0
    [2024-09-25 17:33:14.453] [DEBUG] – root@192.168.128.36 execute: cat /proc/meminfo
    [2024-09-25 17:33:14.546] [DEBUG] – exited code 0
    [2024-09-25 17:33:14.547] [DEBUG] – root@192.168.128.36 execute: df --block-size=1024
    [2024-09-25 17:33:14.638] [DEBUG] – exited code 0
    [2024-09-25 17:33:14.638] [DEBUG] – get disk info for path /dev, total: 12492087296 avail: 12492087296
    [2024-09-25 17:33:14.638] [DEBUG] – get disk info for path /dev/shm, total: 12598837248 avail: 12598751232
    [2024-09-25 17:33:14.638] [DEBUG] – get disk info for path /run, total: 12598837248 avail: 12589006848
    [2024-09-25 17:33:14.638] [DEBUG] – get disk info for path /sys/fs/cgroup, total: 12598837248 avail: 12598837248
    [2024-09-25 17:33:14.638] [DEBUG] – get disk info for path /sys/firmware/efi/efivars, total: 57344 avail: 16384
    [2024-09-25 17:33:14.638] [DEBUG] – get disk info for path /, total: 108360892416 avail: 77661020160
    [2024-09-25 17:33:14.638] [DEBUG] – get disk info for path /boot, total: 531267584 avail: 260554752
    [2024-09-25 17:33:14.638] [DEBUG] – get disk info for path /boot/efi, total: 66959360 avail: 60913664
    [2024-09-25 17:33:14.638] [DEBUG] – get disk info for path /var/lib/ceph/osd/ceph-4, total: 12598837248 avail: 12598808576
    [2024-09-25 17:33:14.638] [DEBUG] – get disk info for path /var/lib/ceph/osd/ceph-0, total: 12598837248 avail: 12598808576
    [2024-09-25 17:33:14.638] [DEBUG] – get disk info for path /mnt/cloud_storage, total: 3127126261760 avail: 3127126261760
    [2024-09-25 17:33:14.638] [DEBUG] – get disk info for path /run/user/0, total: 2519764992 avail: 2519760896
    [2024-09-25 17:33:14.639] [DEBUG] – root@192.168.128.36 execute: df --block-size=1024 /home/admin/obcluster/ocp-express/log
    [2024-09-25 17:33:14.730] [DEBUG] – exited code 0
    [2024-09-25 17:33:14.730] [DEBUG] – get disk info for path /, total: 108360892416 avail: 77661020160
    [2024-09-25 17:33:14.730] [DEBUG] - sub start_check ref count to 0
    [2024-09-25 17:33:14.730] [DEBUG] - export start_check
    [2024-09-25 17:33:14.730] [DEBUG] - plugin ocp-express-py_script_start_check-4.2.2 result: True
    [2024-09-25 17:33:14.732] [INFO] See https://www.oceanbase.com/product/ob-deployer/error-codes .
    [2024-09-25 17:33:14.733] [INFO] Trace ID: 2c9bc634-7b21-11ef-ab49-4ac097000062
    [2024-09-25 17:33:14.733] [INFO] If you want to view detailed obd logs, please run: obd display-trace 2c9bc634-7b21-11ef-ab49-4ac097000062
    [2024-09-25 17:33:14.834] [INFO] [WARN] OBD-1012: (192.168.128.36) clog and data use the same disk (/)
    [2024-09-25 17:33:14.834] [INFO] [WARN] OBD-1012: (192.168.128.37) clog and data use the same disk (/)
    [2024-09-25 17:33:14.834] [INFO] [WARN] OBD-1012: (192.168.128.38) clog and data use the same disk (/)
    [2024-09-25 17:33:14.834] [INFO] [ERROR] oceanbase-ce-py_script_start_check-4.2.1.4 RuntimeError: module ‘_errno’ has no attribute ‘EC_OCP_EXPRESS_META_DB_NOT_ENOUGH_LOG_DISK’
    [2024-09-25 17:33:14.834] [INFO] See https://www.oceanbase.com/product/ob-deployer/error-codes .
    [2024-09-25 17:33:14.834] [INFO] Trace ID: 2c9bc634-7b21-11ef-ab49-4ac097000062
    [2024-09-25 17:33:14.834] [INFO] If you want to view detailed obd logs, please run: obd display-trace 2c9bc634-7b21-11ef-ab49-4ac097000062
    [2024-09-25 17:33:14.834] [INFO]
    [2024-09-25 17:33:14.834] [DEBUG] - unlock /root/.obd/lock/global
    [2024-09-25 17:33:14.834] [DEBUG] - unlock /root/.obd/lock/deploy_obcluster
    [2024-09-25 17:33:14.835] [DEBUG] - unlock /root/.obd/lock/mirror_and_repo

  3. 发现跟py相关的都出错了,服务器自带有python3.12;

  4. 在此之前使用OBD白屏部署方案可成功部署正常使用,只是现在使用命令行部署出现了上面没有查到类似错误的问题。

/home/admin/default-components.yaml 这个文件发下,看起来是ocp-express的metadb clog空间不足

是的,原示例中配置的是15G,我给修改成了5G,重新改回15G,销毁集群、重新部署已经成功。多谢您的指导。
我还是太片面了,盯着RuntimeError和has no attribute以为是python相关导致的,却忽略了EC_OCP_EXPRESS_META_DB_NOT_ENOUGH_LOG_DISK错误码字面含义!