OCP 部署失败,提示 OBD-2002: Failed to start 10.32.2.67 observer

使用ocp-all-in-one-4.2.2-20240329111923.el7.x86_64.tar.gz包部署ocp失败,如图:


后台报错信息如下:

请问各位大佬知道是什么原因么?

1 个赞

看下资源df -h && free -h
如果资源充足得情况下:
1.可以试着到metadb配置下得集群配置(更多配置)自定义修改下资源(memory_limit=20G、log_disk_size60G、datafile_size20G参数)
2.提供下obd日志(~/.obd/log/obd)或者使用命令行执行下obd cluster start name(集群名称) 后在提供OBD日志

1 个赞

大佬,这是我的资源情况

  1. 我在metadb配置下的集群配置(更多配置)自定义修改了资源,也是报那个错。
  2. 下面是执行obd cluster start name(集群名称)后输出的信息
    [root@0021 ~]# obd display-trace 4114a6b4-1cdd-11ef-9fbf-fa163ec842db
    [2024-05-28 18:30:07.700] [DEBUG] - cmd: [‘lahg_ocp’]
    [2024-05-28 18:30:07.700] [DEBUG] - opts: {‘servers’: None, ‘components’: None, ‘force_delete’: None, ‘strict_check’: None, ‘without_parameter’: None}
    [2024-05-28 18:30:07.700] [DEBUG] - mkdir /root/.obd/lock/
    [2024-05-28 18:30:07.700] [DEBUG] - unknown lock mode
    [2024-05-28 18:30:07.700] [DEBUG] - try to get share lock /root/.obd/lock/global
    [2024-05-28 18:30:07.700] [DEBUG] - share lock /root/.obd/lock/global, count 1
    [2024-05-28 18:30:07.701] [DEBUG] - Get Deploy by name
    [2024-05-28 18:30:07.701] [DEBUG] - mkdir /root/.obd/cluster/
    [2024-05-28 18:30:07.701] [DEBUG] - mkdir /root/.obd/config_parser/
    [2024-05-28 18:30:07.701] [DEBUG] - try to get exclusive lock /root/.obd/lock/deploy_lahg_ocp
    [2024-05-28 18:30:07.702] [ERROR] Another app is currently holding the obd lock.
    [2024-05-28 18:30:07.702] [ERROR] Traceback (most recent call last):
    [2024-05-28 18:30:07.702] [ERROR] File “_lock.py”, line 64, in _ex_lock
    [2024-05-28 18:30:07.703] [ERROR] File “tool.py”, line 499, in exclusive_lock_obj
    [2024-05-28 18:30:07.703] [ERROR] BlockingIOError: [Errno 11] Resource temporarily unavailable
    [2024-05-28 18:30:07.703] [ERROR]
    [2024-05-28 18:30:07.703] [ERROR] During handling of the above exception, another exception occurred:
    [2024-05-28 18:30:07.703] [ERROR]
    [2024-05-28 18:30:07.703] [ERROR] Traceback (most recent call last):
    [2024-05-28 18:30:07.703] [ERROR] File “_lock.py”, line 85, in ex_lock
    [2024-05-28 18:30:07.703] [ERROR] File “_lock.py”, line 66, in _ex_lock
    [2024-05-28 18:30:07.703] [ERROR] _errno.LockError: [Errno 11] Resource temporarily unavailable
    [2024-05-28 18:30:07.703] [ERROR]
    [2024-05-28 18:30:07.703] [ERROR] During handling of the above exception, another exception occurred:
    [2024-05-28 18:30:07.703] [ERROR]
    [2024-05-28 18:30:07.703] [ERROR] Traceback (most recent call last):
    [2024-05-28 18:30:07.703] [ERROR] File “obd.py”, line 244, in do_command
    [2024-05-28 18:30:07.703] [ERROR] File “obd.py”, line 896, in _do_command
    [2024-05-28 18:30:07.703] [ERROR] File “core.py”, line 2033, in start_cluster
    [2024-05-28 18:30:07.703] [ERROR] File “_deploy.py”, line 1831, in get_deploy_config
    [2024-05-28 18:30:07.703] [ERROR] File “_deploy.py”, line 1818, in _lock
    [2024-05-28 18:30:07.703] [ERROR] File “_lock.py”, line 283, in deploy_ex_lock
    [2024-05-28 18:30:07.703] [ERROR] File “_lock.py”, line 262, in _ex_lock
    [2024-05-28 18:30:07.703] [ERROR] File “_lock.py”, line 254, in _lock
    [2024-05-28 18:30:07.703] [ERROR] File “_lock.py”, line 185, in lock
    [2024-05-28 18:30:07.703] [ERROR] File “_lock.py”, line 90, in ex_lock
    [2024-05-28 18:30:07.703] [ERROR] _errno.LockError: [Errno 11] Resource temporarily unavailable
    [2024-05-28 18:30:07.703] [ERROR]
    [2024-05-28 18:30:07.703] [INFO] Trace ID: 4114a6b4-1cdd-11ef-9fbf-fa163ec842db
    [2024-05-28 18:30:07.703] [INFO] If you want to view detailed obd logs, please run: obd display-trace 4114a6b4-1cdd-11ef-9fbf-fa163ec842db
    [2024-05-28 18:30:07.703] [DEBUG] - share lock /root/.obd/lock/global release, count 0
    [2024-05-28 18:30:07.704] [DEBUG] - unlock /root/.obd/lock/global
    [2024-05-28 18:30:07.704] [DEBUG] - unlock /root/.obd/lock/deploy_lahg_ocp

提供下obd日志(~/.obd/log/obd)

也看下obd mirror list && obd mirror list local

1 个赞

大佬,您需要的资料,麻烦您看下 :sob:
obd日志:
obd_log.txt (1.1 MB)
命令执行:

资源看下都是多少。
把obd web退出执行obd cluster start name(部署集群名称)看下-使用tab可以看得到。
之后再看下obd mirror list && obd mirrorlist local

错误码2002:


调整下资源

  1. 大佬,这是执行 obd cluster start name 之后输出的内容,您说的使用tab可以看得到资源,我没太明白是什么意思呢。
  2. 这是执行 obd mirror list && obd mirrorlist local 后输出的内容

1.是集群名称 通过obd cluster list 命令得集群名称 不是name

  1. 抱歉 这个命令连住了obd mirror list local
  1. 大佬,这是执行的第一个命令输出:
  2. 大佬,这是执行的第二个命令输出:

obd cluster edit-config lahg_ocp 看下配置文件
或者 cd ~/.obd/cluster/lahg_ocp/ 下得config.yaml文件
image
看下

大佬,配置文件信息:

memory_limit设置26G再试试。如果还是失败提供下obd完整日志