obd cluster start demo失败

Zxxxx · 2023 年12 月 27 日 16:11

【使用环境】测试环境
【 OB or 其他组件】ob all-in-one
【使用版本】4.2.1-ce
【问题描述】通过sql修改root登陆密码之后，单节点部署的demo集群状态变为stopped，然后我使用obd cluster edit-config demo 这个命令给oceanbase-ce模块添加root_password参数，obd cluster start demo失败，observer和obproxy服务都正常启动，但是obagent服务启动失败
【报错信息】
root@enbo:~# obd cluster start demo
Get local repositories ok
Search plugins ok
Open ssh connection ok
Load cluster param plugin ok
Check before start observer ok
Check before start obproxy ok
Check before start obagent ok
Check before start prometheus ok
Check before start grafana ok
Start observer ok
observer program health check ok
Connect to observer 127.0.0.1:2881 ok
Start obproxy ok
obproxy program health check ok
Connect to obproxy ok
Initialize obproxy-ce ok
Start obagent ok
obagent program health check x
[WARN] failed to start 127.0.0.1 obagent
[ERROR] obagent start failed
Wait for observer init ok
±--------------------------------------------+
| observer |
±----------±--------±-----±------±-------+
| ip | version | port | zone | status |
±----------±--------±-----±------±-------+
| 127.0.0.1 | 4.2.1.2 | 2881 | zone1 | ACTIVE |
±----------±--------±-----±------±-------+
obclient -h127.0.0.1 -P2881 -uroot -p’enbo123456’ -Doceanbase -A

±--------------------------------------------+
| obproxy |
±----------±-----±----------------±-------+
| ip | port | prometheus_port | status |
±----------±-----±----------------±-------+
| 127.0.0.1 | 2883 | 2884 | active |
±----------±-----±----------------±-------+
obclient -h127.0.0.1 -P2883 -uroot -p’enbo123456’ -Doceanbase -A
See https://www.oceanbase.com/product/ob-deployer/error-codes .
Trace ID: ebee2030-a48b-11ee-92dc-080027c1bc21
If you want to view detailed obd logs, please run: obd display-trace ebee2030-a48b-11ee-92dc-080027c1bc21

王利博 · 2023 年12 月 27 日 16:31

老师是先通过sql改得root密码。再去配置文件添加得root_password参数得嘛？
如果是这样得话你可以换个方式直接再配置文件添加root_password进行修改root密码呢？

再通过sql命令行上修改root密码时，配置文件是获取不到修改密码得指令得（当前密码为空）。
关机后读取得还是空密码，修改配置文件时只有重启成功后才可以。
所以你得换个方式试一试。

Zxxxx · 2023 年12 月 27 日 16:35

我是先通过sql改的密码然后发现demo停掉了，我用obd cluster start 命令启动他，但是observer服务连接不了，我就修改了配置文件，添加了一个root_password参数，再start的时候 observer服务正常但是obagent服务启动不了。我想知道是不是obagent服务的配置文件需要同步修改些什么呢

王利博 · 2023 年12 月 27 日 16:40

看下日志吧 observer.log日志和 obd display-trace ebee2030-a48b-11ee-92dc-080027c1bc21

Zxxxx · 2023 年12 月 27 日 16:42

请问observer.log在哪里

王利博 · 2023 年12 月 27 日 16:46

指定路劲下得log里，看看

shunwah · 2023 年12 月 27 日 16:47

根据你提供的输出，obagent 在启动过程中遇到了问题，并给出了错误提示 “[ERROR] obagent start failed”。这表明 obagent 无法正常启动，这可能是由于多种原因，例如配置错误、权限问题、依赖项缺失等。

为了解决这个问题，你可以尝试以下步骤：

检查 obagent 的日志：通常，obagent 的日志中会包含更详细的错误信息，这有助于定位问题的原因。你可以通过运行 obd display-trace 命令来查看详细的 obd 日志。
检查 obagent 的配置：确保 obagent 的配置文件（通常位于 /etc/oceanbase/obagent/conf）中的参数设置正确，特别是与网络、权限和证书相关的配置。
检查系统资源：确保系统有足够的资源（如内存、磁盘空间等）来运行 obagent。
检查依赖项：确保 obagent 所需的依赖项已正确安装，并且版本兼容。
重新启动 obagent：尝试重新启动 obagent，看是否能够成功启动。

Zxxxx · 2023 年12 月 27 日 16:55

日志里面只有一行error日志提示start失败

Zxxxx · 2023 年12 月 27 日 17:27

我尝试使用命令单独拉起obagent也不行但是没有报错信息

Zxxxx · 2023 年12 月 27 日 17:27

root@enbo:~/obagent/log# /root/obagent/bin/ob_agentctl start
{“successful”:false,“message”:null,“error”:"Module=agent, kind=INTERNAL, code=agentd_exited_quickly; "}root@enbo:~/obagent/log#

Zxxxx · 2023 年12 月 27 日 17:35

我执行 obd cluster stop demo 停掉集群服务然后obd cluster start demo
obagent启动成功但是提示prometheus连接失败，信息如下：

[2023-12-27 17:29:58.177] [ERROR] requests.exceptions.ConnectionError: HTTPConnectionPool(host=‘127.0.0.1’, port=9090): Max retries exceeded with url: /-/healthy (Caused by NewConnectionError(’<urllib3.connection.HTTPConnection object at 0x7fb91acf2d00>: Failed to establish a new connection: [Errno 111] Connection refused’))
[2023-12-27 17:29:58.177] [ERROR]
[2023-12-27 17:29:58.177] [DEBUG] – request prometheus failed: HTTPConnectionPool(host=‘127.0.0.1’, port=9090): Max retries exceeded with url: /-/healthy (Caused by NewConnectionError(’<urllib3.connection.HTTPConnection object at 0x7fb91acf2d00>: Failed to establish a new connection: [Errno 111] Connection refused’))
[2023-12-27 17:29:58.177] [ERROR] OBD-1006: Failed to connect to prometheus
[2023-12-27 17:29:58.310] [INFO] [ERROR] OBD-1006: Failed to connect to prometheus

王利博 · 2023 年12 月 27 日 18:16

obd cluster start/restart demo -c prometheus 指定下组件试试

Zxxxx · 2023 年12 月 28 日 09:10

已经解决需要重复执行obd cluster start demo 不知道是不是组件之间存在上次运行的一些缓存数据需要清除