【 使用环境 】测试环境
【 OB or 其他组件 】ocp 集群
【 使用版本 】v4.4.0
【问题描述】obd 启动ocp集群报错
[admin@obd01 ocp_1]$ obd cluster list;
±---------------------------------------------------------+
| Cluster List |
±------±-------------------------------±----------------+
| Name | Configuration Path | Status (Cached) |
±------±-------------------------------±----------------+
| ocp_1 | /home/admin/.obd/cluster/ocp_1 | stopped |
±------±-------------------------------±----------------+
Trace ID: 97b87e04-f509-11f0-bb95-000c29a1a8c4
If you want to view detailed obd logs, please run: obd display-trace 97b87e04-f509-11f0-bb95-000c29a1a8c4
[admin@obd01 ~]$ obd cluster start ocp_1
Get local repositories ok
Load cluster param plugin ok
Open ssh connection ok
Check before start ocp-server-ce ok
cluster scenario: None
Start observer ok
observer program health check ok
Connect to observer 192.168.56.149:2881 ok
obshell start ok
obshell program health check ok
start obproxy ok
obproxy program health check ok
Connect to obproxy ok
Start ocp-server-ce ok
[ERROR] failed to start 192.168.56.149 ocp-server-ce
[ERROR] failed to start 192.168.56.150 ocp-server-ce
Trace ID: e2d83f9c-f508-11f0-bb6a-000c29a1a8c4
If you want to view detailed obd logs, please run: obd display-trace e2d83f9c-f508-11f0-bb6a-000c29a1a8c4
[admin@obd01 ~]$
[admin@obd01 ~]$
1 个赞
[admin@obd01 ~]$ obd display-trace e2d83f9c-f508-11f0-bb6a-000c29a1a8c4
…
[2026-01-19 15:32:46.988] [DEBUG] – exited code 0
[2026-01-19 15:32:46.988] [DEBUG] – write 85106 to admin@192.168.56.150:22: /home/admin/ocp/run/ocp-server.pid
[2026-01-19 15:32:47.226] [DEBUG] - current remote_transporter RemoteTransporter.CLIENT
[2026-01-19 15:32:47.226] [DEBUG] – admin@192.168.56.150 execute: mkdir -p /home/admin/ocp/run && rm -fr /home/admin/ocp/run/ocp-server.pid
[2026-01-19 15:32:47.404] [DEBUG] – exited code 0
[2026-01-19 15:32:47.404] [DEBUG] – send /tmp/tmpbrym6mdc to /home/admin/ocp/run/ocp-server.pid
[2026-01-19 15:32:47.477] [DEBUG] - admin@192.168.56.150 execute: chmod 600 /home/admin/ocp/run/ocp-server.pid
[2026-01-19 15:32:47.835] [DEBUG] - exited code 0
[2026-01-19 15:32:47.912] [DEBUG] - sub start ref count to 0
[2026-01-19 15:32:47.912] [DEBUG] - export start
[2026-01-19 15:32:47.912] [DEBUG] - plugin ocp-server-ce-py_script_start-4.4.0 result: True
[2026-01-19 15:32:47.912] [DEBUG] - Searching health_check plugin for components …
[2026-01-19 15:32:47.912] [DEBUG] - Searching health_check plugin for ocp-server-ce-4.4.0-20251114143405.el7-f673d693677a2c640f925ad2127a604aaebf00bf
[2026-01-19 15:32:47.913] [DEBUG] - Found for ocp-server-ce-py_script_health_check-4.2.1 for ocp-server-ce-4.4.0
[2026-01-19 15:32:47.913] [DEBUG] - Call plugin ocp-server-ce-py_script_health_check-4.2.1 for ocp-server-ce-4.4.0-20251114143405.el7-f673d693677a2c640f925ad2127a604aaebf00bf
[2026-01-19 15:32:47.913] [DEBUG] - import health_check
[2026-01-19 15:32:47.913] [DEBUG] - add health_check ref count to 1
[2026-01-19 15:32:47.914] [INFO] ocp-server-ce program health check
[2026-01-19 15:32:47.918] [DEBUG] – 192.168.56.149 program health check
[2026-01-19 15:32:47.918] [DEBUG] – admin@192.168.56.149 execute: ls /proc/31429
[2026-01-19 15:32:47.971] [DEBUG] – exited code 2, error output:
[2026-01-19 15:32:47.971] [DEBUG] ls: 无法访问/proc/31429: 没有那个文件或目录
[2026-01-19 15:32:47.971] [DEBUG]
[2026-01-19 15:32:47.971] [DEBUG] – 192.168.56.150 program health check
[2026-01-19 15:32:47.971] [DEBUG] – admin@192.168.56.150 execute: ls /proc/85106
[2026-01-19 15:32:48.123] [DEBUG] – exited code 0
[2026-01-19 15:32:48.123] [DEBUG] – admin@192.168.56.150 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{print $2,$10}’ | grep ‘00000000:1F90’ | awk -F’ ’ ‘{print $2}’ | uniq
[2026-01-19 15:32:48.451] [DEBUG] – exited code 0
[2026-01-19 15:32:48.452] [DEBUG] – failed to start 192.168.56.150 ocp-server-ce, remaining retries: 119
[2026-01-19 15:33:03.452] [DEBUG] – 192.168.56.150 program health check
[2026-01-19 15:33:03.452] [DEBUG] – admin@192.168.56.150 execute: ls /proc/85106
[2026-01-19 15:33:03.581] [DEBUG] – exited code 0
[2026-01-19 15:33:03.581] [DEBUG] – admin@192.168.56.150 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{print $2,$10}’ | grep ‘00000000:1F90’ | awk -F’ ’ ‘{print $2}’ | uniq
[2026-01-19 15:33:03.674] [DEBUG] – exited code 0
[2026-01-19 15:33:03.675] [DEBUG] – failed to start 192.168.56.150 ocp-server-ce, remaining retries: 118
[2026-01-19 15:33:18.675] [DEBUG] – 192.168.56.150 program health check
[2026-01-19 15:33:18.675] [DEBUG] – admin@192.168.56.150 execute: ls /proc/85106
[2026-01-19 15:33:18.832] [DEBUG] – exited code 0
[2026-01-19 15:33:18.833] [DEBUG] – admin@192.168.56.150 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{print $2,$10}’ | grep ‘00000000:1F90’ | awk -F’ ’ ‘{print $2}’ | uniq
[2026-01-19 15:33:19.026] [DEBUG] – exited code 0
[2026-01-19 15:33:19.026] [DEBUG] – failed to start 192.168.56.150 ocp-server-ce, remaining retries: 117
[2026-01-19 15:33:34.027] [DEBUG] – 192.168.56.150 program health check
[2026-01-19 15:33:34.027] [DEBUG] – admin@192.168.56.150 execute: ls /proc/85106
[2026-01-19 15:33:34.047] [DEBUG] – exited code 2, error output:
[2026-01-19 15:33:34.047] [DEBUG] ls: 无法访问/proc/85106: 没有那个文件或目录
[2026-01-19 15:33:34.047] [DEBUG]
[2026-01-19 15:33:34.086] [ERROR] [ERROR] failed to start 192.168.56.149 ocp-server-ce
[2026-01-19 15:33:34.086] [ERROR] [ERROR] failed to start 192.168.56.150 ocp-server-ce
[2026-01-19 15:33:34.086] [DEBUG] - sub health_check ref count to 0
[2026-01-19 15:33:34.086] [DEBUG] - export health_check
[2026-01-19 15:33:34.086] [DEBUG] - plugin ocp-server-ce-py_script_health_check-4.2.1 result: False
[2026-01-19 15:33:34.089] [DEBUG] - share lock /home/admin/.obd/lock/mirror_and_repo release, count 5
[2026-01-19 15:33:34.090] [DEBUG] - share lock /home/admin/.obd/lock/mirror_and_repo release, count 4
[2026-01-19 15:33:34.090] [DEBUG] - share lock /home/admin/.obd/lock/mirror_and_repo release, count 3
[2026-01-19 15:33:34.090] [DEBUG] - share lock /home/admin/.obd/lock/mirror_and_repo release, count 2
[2026-01-19 15:33:34.090] [DEBUG] - share lock /home/admin/.obd/lock/mirror_and_repo release, count 1
[2026-01-19 15:33:34.090] [DEBUG] - share lock /home/admin/.obd/lock/mirror_and_repo release, count 0
[2026-01-19 15:33:34.090] [DEBUG] - unlock /home/admin/.obd/lock/mirror_and_repo
[2026-01-19 15:33:34.090] [DEBUG] - exclusive lock /home/admin/.obd/lock/deploy_ocp_1 release, count 0
[2026-01-19 15:33:34.090] [DEBUG] - unlock /home/admin/.obd/lock/deploy_ocp_1
[2026-01-19 15:33:34.090] [DEBUG] - share lock /home/admin/.obd/lock/global release, count 0
[2026-01-19 15:33:34.090] [DEBUG] - unlock /home/admin/.obd/lock/global
[2026-01-19 15:33:34.090] [INFO] Trace ID: e2d83f9c-f508-11f0-bb6a-000c29a1a8c4
[2026-01-19 15:33:34.090] [INFO] If you want to view detailed obd logs, please run: obd display-trace e2d83f9c-f508-11f0-bb6a-000c29a1a8c4
[admin@obd01 ~]$
辞霜
2026 年1 月 19 日 16:17
#5
需要提供一份ocp-server日志看下。不过看yaml文件,ob的内存给的有点少最好给12G以上
现在分的8G也能用,我这个环境前几天还能正常启动和关闭,今天启动就有问题了。
ocp-server.log (1.3 MB)
辞霜
2026 年1 月 19 日 16:37
#7
日志报错内存不够No memory or reach tenant memory limit。meta和monitor租户内存不够用了
辞霜
2026 年1 月 19 日 18:08
#9
通过obd edit-config修改参数配置调整