obd cluster restart 一直 卡在 Connect to observer ,怎么处理?

【 使用环境 】生产环境
【 OB or 其他组件 】OB
【 使用版本 】4.1.0.1-100120230616095544
【问题描述】
OB无法连接
使用obd 去测试 ,发现 一直 处于 Connect to observer,重启 也是卡在
Connect to observer

求助 还有 其他处理办法吗?因为是客户 生产 数据,要小心

【复现路径】
[root@app bin]# obd cluster list
±-----------------------------------------------------------+
| Cluster List |
±----------±-----------------------------±----------------+
| Name | Configuration Path | Status (Cached) |
±----------±-----------------------------±----------------+
| obcluster | /root/.obd/cluster/obcluster | running |
±----------±-----------------------------±----------------+
Trace ID: 5c736b60-842c-11ee-ac1b-52549690f0f0
If you want to view detailed obd logs, please run: obd display-trace 5c736b60-842c-11ee-ac1b-52549690f0f0
[root@app bin]# obd cluster restart obcluster
Get local repositories and plugins ok
Load cluster param plugin ok
Open ssh connection ok
Cluster status check ok
Connect to observer x

【问题现象及影响】

【附件】

看下obd 日志 ~/.obd/log/obd

[2023-11-16 11:01:02.719] [60660aa2-842c-11ee-9401-52549690f0f0] [INFO] Load cluster param plugin

[2023-11-16 11:01:02.720] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - Searching param plugin for components …

[2023-11-16 11:01:02.720] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - Search param plugin for oceanbase-ce

[2023-11-16 11:01:02.721] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - Found for oceanbase-ce-param-4.0.0.0 for oceanbase-ce-4.1.0.1

[2023-11-16 11:01:02.721] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - Applying oceanbase-ce-param-4.0.0.0 for oceanbase-ce-4.1.0.1-102000042023061314.el7-d03fafa6fa8ceb0636e4db05b5b5f6c3ac2256a3

[2023-11-16 11:01:03.474] [60660aa2-842c-11ee-9401-52549690f0f0] [INFO] Open ssh connection

[2023-11-16 11:01:03.607] [60660aa2-842c-11ee-9401-52549690f0f0] [INFO] Cluster status check

[2023-11-16 11:01:03.608] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - Searching status plugin for components …

[2023-11-16 11:01:03.608] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - Searching status plugin for oceanbase-ce-4.1.0.1-102000042023061314.el7-d03fafa6fa8ceb0636e4db05b5b5f6c3ac2256a3

[2023-11-16 11:01:03.609] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - Found for oceanbase-ce-py_script_status-3.1.0 for oceanbase-ce-4.1.0.1

[2023-11-16 11:01:03.609] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - Call oceanbase-ce-py_script_status-3.1.0 for oceanbase-ce-4.1.0.1-102000042023061314.el7-d03fafa6fa8ceb0636e4db05b5b5f6c3ac2256a3

[2023-11-16 11:01:03.609] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - import status

[2023-11-16 11:01:03.610] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - add status ref count to 1

[2023-11-16 11:01:03.610] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] – root@192.168.100.31 execute: cat /opt/observer/run/observer.pid

[2023-11-16 11:01:03.771] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] – exited code 0

[2023-11-16 11:01:03.772] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] – root@192.168.100.31 execute: ls /proc/18508

[2023-11-16 11:01:03.832] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] – exited code 0

[2023-11-16 11:01:03.832] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - sub status ref count to 0

[2023-11-16 11:01:03.832] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - export status

[2023-11-16 11:01:03.869] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - Call oceanbase-ce-py_script_restart-4.0.0.0 for oceanbase-ce-4.1.0.1-102000042023061314.el7-d03fafa6fa8ceb0636e4db05b5b5f6c3ac2256a3

[2023-11-16 11:01:03.870] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - import restart

[2023-11-16 11:01:03.874] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - add restart ref count to 1

[2023-11-16 11:01:03.874] [60660aa2-842c-11ee-9401-52549690f0f0] [INFO] Connect to observer

[2023-11-16 11:01:03.875] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] – Call oceanbase-ce-py_script_connect-3.1.0 for oceanbase-ce-4.1.0.1-102000042023061314.el7-d03fafa6fa8ceb0636e4db05b5b5f6c3ac2256a3

[2023-11-16 11:01:03.875] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] — import connect

[2023-11-16 11:01:03.906] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] — add connect ref count to 1

[2023-11-16 11:01:03.906] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] ---- connect 192.168.100.31 -P2881 -uroot -pX07Rewf49NLrC1hkuhEl

[2023-11-16 11:09:44.663] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] Keyboard Interrupt

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] Traceback (most recent call last):

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “_plugin.py”, line 285, in _new_func

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “/root/.obd/plugins/oceanbase-ce/3.1.0/connect.py”, line 142, in connect

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] time.sleep(3)

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] KeyboardInterrupt

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR]

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] During handling of the above exception, another exception occurred:

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR]

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] Traceback (most recent call last):

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “obd.py”, line 236, in do_command

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “obd.py”, line 862, in _do_command

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “core.py”, line 2225, in restart_cluster

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “core.py”, line 184, in call_plugin

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “_plugin.py”, line 323, in call

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “_plugin.py”, line 285, in _new_func

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “/root/.obd/plugins/oceanbase-ce/4.0.0.0/restart.py”, line 309, in restart

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] if call():

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “/root/.obd/plugins/oceanbase-ce/4.0.0.0/restart.py”, line 268, in restart

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] if self.connect():

[2023-11-16 11:09:44.664] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “/root/.obd/plugins/oceanbase-ce/4.0.0.0/restart.py”, line 90, in connect

[2023-11-16 11:09:44.665] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] ret = self.call_plugin(self.connect_plugin)

[2023-11-16 11:09:44.665] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “/root/.obd/plugins/oceanbase-ce/4.0.0.0/restart.py”, line 79, in call_plugin

[2023-11-16 11:09:44.665] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] return plugin(**args)

[2023-11-16 11:09:44.665] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “_plugin.py”, line 323, in call

[2023-11-16 11:09:44.665] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] File “_plugin.py”, line 285, in _new_func

[2023-11-16 11:09:44.665] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR] KeyboardInterrupt

[2023-11-16 11:09:44.665] [60660aa2-842c-11ee-9401-52549690f0f0] [ERROR]

[2023-11-16 11:09:44.665] [60660aa2-842c-11ee-9401-52549690f0f0] [INFO] Trace ID: 60660aa2-842c-11ee-9401-52549690f0f0

[2023-11-16 11:09:44.665] [60660aa2-842c-11ee-9401-52549690f0f0] [INFO] If you want to view detailed obd logs, please run: obd display-trace 60660aa2-842c-11ee-9401-52549690f0f0

[2023-11-16 11:09:44.666] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 0

[2023-11-16 11:09:44.666] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - unlock /root/.obd/lock/mirror_and_repo

[2023-11-16 11:09:44.666] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - exclusive lock /root/.obd/lock/deploy_obcluster release, count 0

[2023-11-16 11:09:44.666] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - unlock /root/.obd/lock/deploy_obcluster

[2023-11-16 11:09:44.666] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - share lock /root/.obd/lock/global release, count 0

[2023-11-16 11:09:44.666] [60660aa2-842c-11ee-9401-52549690f0f0] [DEBUG] - unlock /root/.obd/lock/global

[2023-11-16 11:09:44.675] [60660aa2-842c-11ee-9401-52549690f0f0] [INFO] [ERROR] Keyboard Interrupt

[2023-11-16 11:09:44.675] [60660aa2-842c-11ee-9401-52549690f0f0] [INFO] Trace ID: 60660aa2-842c-11ee-9401-52549690f0f0

[2023-11-16 11:09:44.675] [60660aa2-842c-11ee-9401-52549690f0f0] [INFO] If you want to view detailed obd logs, please run: obd display-trace 60660aa2-842c-11ee-9401-52549690f0f0

[2023-11-16 11:09:44.675] [60660aa2-842c-11ee-9401-52549690f0f0] [INFO]

我看到这个 和我的密码 不对,是不是这个原因?

[2023-11-16 11:01:03.906] [DEBUG] — add connect ref count to 1

[2023-11-16 11:01:03.906] [DEBUG] ---- connect 192.168.100.31 -P2881 -uroot -pX07Rewf49NLrC1hkuhEl

密码不对应该会直接报错,这个密码是从配置文件中拿到的。你对照下

obd cluster edit-config 部署名称 看下 root_passowrd参数

有可能是密码不对,是不是修改过obd的yaml配置文件?改成正确的root用户密码试试

下面root 密码 给我改过,不对了。但是 我保存 也卡住,是服务器 问题吗?

[root@app .obd]# obd cluster edit-config obcluster

Search param plugin and load ok

[root@app opt]# obd cluster edit-config obcluster

[ERROR] Another app is currently holding the obd lock.

Trace ID: c4c52bec-842f-11ee-b08d-52549690f0f0

If you want to view detailed obd logs, please run: obd display-trace c4c52bec-842f-11ee-b08d-52549690f0f0

[root@app opt]# obd display-trace c4c52bec-842f-11ee-b08d-52549690f0f0

[2023-11-16 11:25:19.465] [DEBUG] - cmd: [‘obcluster’]

[2023-11-16 11:25:19.465] [DEBUG] - opts: {}

[2023-11-16 11:25:19.466] [DEBUG] - mkdir /root/.obd/lock/

[2023-11-16 11:25:19.466] [DEBUG] - unknown lock mode

[2023-11-16 11:25:19.466] [DEBUG] - try to get share lock /root/.obd/lock/global

[2023-11-16 11:25:19.466] [DEBUG] - share lock /root/.obd/lock/global, count 1

[2023-11-16 11:25:19.466] [DEBUG] - Get Deploy by name

[2023-11-16 11:25:19.466] [DEBUG] - mkdir /root/.obd/cluster/

[2023-11-16 11:25:19.467] [DEBUG] - mkdir /root/.obd/config_parser/

[2023-11-16 11:25:19.467] [DEBUG] - try to get exclusive lock /root/.obd/lock/deploy_obcluster

[2023-11-16 11:25:19.468] [ERROR] Another app is currently holding the obd lock.

[2023-11-16 11:25:19.468] [ERROR] Traceback (most recent call last):

[2023-11-16 11:25:19.468] [ERROR] File “_lock.py”, line 64, in _ex_lock

[2023-11-16 11:25:19.468] [ERROR] File “tool.py”, line 493, in exclusive_lock_obj

[2023-11-16 11:25:19.468] [ERROR] BlockingIOError: [Errno 11] Resource temporarily unavailable

[2023-11-16 11:25:19.468] [ERROR]

[2023-11-16 11:25:19.468] [ERROR] During handling of the above exception, another exception occurred:

[2023-11-16 11:25:19.468] [ERROR]

[2023-11-16 11:25:19.468] [ERROR] Traceback (most recent call last):

[2023-11-16 11:25:19.468] [ERROR] File “_lock.py”, line 85, in ex_lock

[2023-11-16 11:25:19.468] [ERROR] File “_lock.py”, line 66, in _ex_lock

[2023-11-16 11:25:19.469] [ERROR] _errno.LockError: [Errno 11] Resource temporarily unavailable

[2023-11-16 11:25:19.469] [ERROR]

[2023-11-16 11:25:19.469] [ERROR] During handling of the above exception, another exception occurred:

[2023-11-16 11:25:19.469] [ERROR]

[2023-11-16 11:25:19.469] [ERROR] Traceback (most recent call last):

[2023-11-16 11:25:19.469] [ERROR] File “obd.py”, line 236, in do_command

[2023-11-16 11:25:19.469] [ERROR] File “obd.py”, line 921, in _do_command

[2023-11-16 11:25:19.469] [ERROR] File “core.py”, line 499, in edit_deploy_config

[2023-11-16 11:25:19.469] [ERROR] File “_deploy.py”, line 1497, in get_deploy_config

[2023-11-16 11:25:19.469] [ERROR] File “_deploy.py”, line 1484, in _lock

[2023-11-16 11:25:19.469] [ERROR] File “_lock.py”, line 283, in deploy_ex_lock

[2023-11-16 11:25:19.469] [ERROR] File “_lock.py”, line 262, in _ex_lock

[2023-11-16 11:25:19.469] [ERROR] File “_lock.py”, line 254, in _lock

[2023-11-16 11:25:19.469] [ERROR] File “_lock.py”, line 185, in lock

[2023-11-16 11:25:19.469] [ERROR] File “_lock.py”, line 90, in ex_lock

[2023-11-16 11:25:19.469] [ERROR] _errno.LockError: [Errno 11] Resource temporarily unavailable

[2023-11-16 11:25:19.469] [ERROR]

[2023-11-16 11:25:19.469] [INFO] Trace ID: c4c52bec-842f-11ee-b08d-52549690f0f0

[2023-11-16 11:25:19.469] [INFO] If you want to view detailed obd logs, please run: obd display-trace c4c52bec-842f-11ee-b08d-52549690f0f0

[2023-11-16 11:25:19.470] [DEBUG] - share lock /root/.obd/lock/global release, count 0

[2023-11-16 11:25:19.470] [DEBUG] - unlock /root/.obd/lock/global

[2023-11-16 11:25:19.470] [DEBUG] - unlock /root/.obd/lock/deploy_obcluster

Another app is currently holding the obd lock.

有其他obd命令占用,obd只支持单用户执行。
ps -ef|grep obd 关掉其他obd正在执行命令

密码已经修改正确了。然后 我想重启 ,也还是 卡在

[root@app .obd]# obd cluster edit-config obcluster
Search param plugin and load ok
Search param plugin and load ok
Parameter check ok
Save deploy “obcluster” configuration
Use obd cluster reload obcluster to make changes take effect.
Trace ID: 3e2b3cba-8430-11ee-8821-52549690f0f0
If you want to view detailed obd logs, please run: obd display-trace 3e2b3cba-8430-11ee-8821-52549690f0f0
[root@app .obd]# ob cluster list
-bash: ob: command not found
[root@app .obd]# obd cluster list
±-----------------------------------------------------------+
| Cluster List |
±----------±-----------------------------±----------------+
| Name | Configuration Path | Status (Cached) |
±----------±-----------------------------±----------------+
| obcluster | /root/.obd/cluster/obcluster | running |
±----------±-----------------------------±----------------+
Trace ID: 50c4dda4-8430-11ee-ac20-52549690f0f0
If you want to view detailed obd logs, please run: obd display-trace 50c4dda4-8430-11ee-ac20-52549690f0f0
[root@app .obd]# obd cluster restart obcluster
Get local repositories and plugins ok
Load cluster param plugin ok
Open ssh connection ok
Cluster status check ok
Connect to observer |
Connect to observer \

收到

reload 也是卡在 Connect to observer 。。。。。

[root@app .obd]# obd cluster reload obcluster

Get local repositories and plugins ok

Load cluster param plugin ok

Open ssh connection ok

Cluster status check ok

Connect to observer -

[2023-11-16 11:33:27.406] [e6d53230-8430-11ee-9fa9-52549690f0f0] [DEBUG] – connect 192.168.100.31 -P2881 -uroot -pX07Rewf49NLrC1hkuhEl

[2023-11-16 11:36:53.181] [6241e42c-8431-11ee-ba6a-52549690f0f0] [DEBUG] - cmd: [‘obcluster’]

[2023-11-16 11:36:53.182] [6241e42c-8431-11ee-ba6a-52549690f0f0] [DEBUG] - opts: {}

[2023-11-16 11:36:53.182] [6241e42c-8431-11ee-ba6a-52549690f0f0] [DEBUG] - mkdir /root/.obd/lock/

[2023-11-16 11:36:53.182] [6241e42c-8431-11ee-ba6a-52549690f0f0] [DEBUG] - unknown lock mode

[2023-11-16 11:36:53.182] [6241e42c-8431-11ee-ba6a-52549690f0f0] [DEBUG] - try to get share lock /root/.obd/lock/global

[2023-11-16 11:36:53.183] [6241e42c-8431-11ee-ba6a-52549690f0f0] [DEBUG] - share lock /root/.obd/lock/global, count 1

[2023-11-16 11:36:53.183] [6241e42c-8431-11ee-ba6a-52549690f0f0] [DEBUG] - Get Deploy by name

[2023-11-16 11:36:53.183] [6241e42c-8431-11ee-ba6a-52549690f0f0] [DEBUG] - mkdir /root/.obd/cluster/

[2023-11-16 11:36:53.183] [6241e42c-8431-11ee-ba6a-52549690f0f0] [DEBUG] - mkdir /root/.obd/config_parser/

[2023-11-16 11:36:53.184] [6241e42c-8431-11ee-ba6a-52549690f0f0] [DEBUG] - try to get exclusive lock /root/.obd/lock/deploy_obcluster

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR] Another app is currently holding the obd lock.

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR] Traceback (most recent call last):

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR] File “_lock.py”, line 64, in _ex_lock

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR] File “tool.py”, line 493, in exclusive_lock_obj

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR] BlockingIOError: [Errno 11] Resource temporarily unavailable

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR]

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR] During handling of the above exception, another exception occurred:

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR]

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR] Traceback (most recent call last):

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR] File “_lock.py”, line 85, in ex_lock

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR] File “_lock.py”, line 66, in _ex_lock

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR] _errno.LockError: [Errno 11] Resource temporarily unavailable

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR]

[2023-11-16 11:36:53.185] [6241e42c-8431-11ee-ba6a-52549690f0f0] [ERROR] During handling of the above exception, another exception occurred:

  1. yaml里的root_password的密码是和root密码一样嘛?
    2.如果可以登录到数据库 把root设置为空,yaml的root_password ‘’
    ALTER USER root IDENTIFIED BY ‘’;
    3.如果还是不可以可以发下日志

当前报错:Another app is currently holding the obd lock.
先 ps -ef | grep obd 查看其他正在执行的obd命令,用kill -9 杀掉这些obd进程,然后在执行上面的操作。

问题已经确认:
[2023-11-16 13:23:08.719888] WDIAG [COMMON] wait (ob_io_define.cpp:859) [14132][][T500][Y0-0000000000000000-0-0] [lt=21][errcode=-4224] IO error, (ret=-4224

磁盘IO存在异常,实测解压文件也会出现卡住问题,用户侧可以联系服务器厂商或机房运维进行修复或替换。