ocp重启报错问题

oceanbase:4.2.1.8
OceanBase Deploy: 3.2.2
ocp:4.3.5

用obd web部署了oceanbase + obproxy+ocp
现在重启 ocp server报错,如下:

$ obd cluster list                                      
+----------------------------------------------------------+
|                       Cluster List                       |
+-------+--------------------------------+-----------------+
| Name  | Configuration Path             | Status (Cached) |
+-------+--------------------------------+-----------------+
| myocp | /home/admin/.obd/cluster/myocp | running         |
+-------+--------------------------------+-----------------+
$ obd cluster display myocp                             
Get local repositories and plugins ok
Open ssh connection ok
Connect to ocp-server-ce ok
+----------------------------------------------------------------+
|                         ocp-server-ce                          |
+----------------------------+----------+---------------+--------+
| url                        | username | password      | status |
+----------------------------+----------+---------------+--------+
| http://xxx.xxx             | admin    | xxxxxxxxxxxxx | active |
+----------------------------+----------+---------------+--------+

Wait for observer init ok
+------------------------------------------------+
|                  oceanbase-ce                  |
+--------------+---------+------+-------+--------+
| ip           | version | port | zone  | status |
+--------------+---------+------+-------+--------+
| 10.26.xxxxx  | 4.2.1.8 | 2881 | zone1 | ACTIVE |
| 10.26.xxxxxxx| 4.2.1.8 | 2881 | zone2 | ACTIVE |
| 10.26.xxxxxx | 4.2.1.8 | 2881 | zone3 | ACTIVE |
+--------------+---------+------+-------+--------+

Connect to obproxy ok
+------------------------------------------------------------------+
|                            obproxy-ce                            |
+--------------+------+-----------------+-----------------+--------+
| ip           | port | prometheus_port | rpc_listen_port | status |
+--------------+------+-----------------+-----------------+--------+
| 10.16.xxxxxx | 2883 | 2884            | 2885            | active |
+--------------+------+-----------------+-----------------+--------+
$ obd cluster restart myocp --servers ocp               
Get local repositories and plugins ok
Load cluster param plugin ok
Open ssh connection ok
Cluster status check ok
[ERROR] oceanbase-ce-py_script_ocp_tenant_check-4.0.0.0 RuntimeError: list index out of range
See https://www.oceanbase.com/product/ob-deployer/error-codes .
Trace ID: 1f7956d0-6255-11f0-a62c-86dfe73a3abb
If you want to view detailed obd logs, please run: obd display-trace 1f7956d0-6255-11f0-a62c-86dfe73a3abb
$obd display-trace 1f7956d0-6255-11f0-a62c-86dfe73a3abb
..........................
[2025-07-16 22:57:03.346] [DEBUG] - Found for oceanbase-ce-py_script_ocp_tenant_check-4.0.0.0 for oceanbase-ce-4.2.1.8
[2025-07-16 22:57:03.346] [DEBUG] - Call plugin oceanbase-ce-py_script_ocp_tenant_check-4.0.0.0 for oceanbase-ce-4.2.1.8-108000022024072217.el7-499b676f2ede5a16e0c07b2b15991d1160d972e8
[2025-07-16 22:57:03.347] [DEBUG] - import ocp_tenant_check
[2025-07-16 22:57:03.349] [DEBUG] - add ocp_tenant_check ref count to 1
[2025-07-16 22:57:03.350] [ERROR] oceanbase-ce-py_script_ocp_tenant_check-4.0.0.0 RuntimeError: list index out of range
[2025-07-16 22:57:03.350] [ERROR] Traceback (most recent call last):
[2025-07-16 22:57:03.351] [ERROR]   File "core.py", line 2756, in restart_cluster
[2025-07-16 22:57:03.351] [ERROR]   File "core.py", line 2864, in _restart_cluster
[2025-07-16 22:57:03.351] [ERROR]   File "core.py", line 246, in run_workflow
[2025-07-16 22:57:03.351] [ERROR]   File "core.py", line 288, in run_plugin_template
[2025-07-16 22:57:03.351] [ERROR]   File "core.py", line 336, in call_plugin
[2025-07-16 22:57:03.351] [ERROR]   File "_plugin.py", line 348, in __call__
[2025-07-16 22:57:03.351] [ERROR]   File "_plugin.py", line 304, in _new_func
[2025-07-16 22:57:03.351] [ERROR]   File "/home/admin/.obd/plugins/oceanbase-ce/4.0.0.0/ocp_tenant_check.py", line 36, in ocp_tenant_check
[2025-07-16 22:57:03.351] [ERROR]     server = cluster_config.servers[0]
[2025-07-16 22:57:03.351] [ERROR] IndexError: list index out of range
[2025-07-16 22:57:03.351] [ERROR] 
[2025-07-16 22:57:03.351] [DEBUG] - sub ocp_tenant_check ref count to 0
[2025-07-16 22:57:03.352] [DEBUG] - export ocp_tenant_check
[2025-07-16 22:57:03.352] [DEBUG] - plugin oceanbase-ce-py_script_ocp_tenant_check-4.0.0.0 result: False
[2025-07-16 22:57:03.357] [INFO] See https://www.oceanbase.com/product/ob-deployer/err
...........................

2 个赞

蹲~~

2 个赞

ob cluster

2 个赞

主要是这个命令对不对,看报错是没有找到对应的集群

2 个赞

查一下ob集群是否租户都存在select * from OCEANBASE.dba_ob_tenants,集群重启试试
obd cluster restart myocp

2 个赞

metadb:

+-----------+-------------+-------------+----------------------------+-------------+-------------------+--------------+
| TENANT_ID | TENANT_NAME | TENANT_TYPE | CREATE_TIME                | TENANT_ROLE | SWITCHOVER_STATUS | LOG_MODE     |
+-----------+-------------+-------------+----------------------------+-------------+-------------------+--------------+
|         1 | sys         | SYS         | 2025-07-11 12:15:01.208935 | PRIMARY     | NORMAL            | NOARCHIVELOG |
|      1001 | META$1002   | META        | 2025-07-11 12:15:46.262843 | PRIMARY     | NORMAL            | NOARCHIVELOG |
|      1002 | ocp_meta    | USER        | 2025-07-11 12:15:46.266316 | PRIMARY     | NORMAL            | NOARCHIVELOG |
|      1003 | META$1004   | META        | 2025-07-11 12:16:18.726656 | PRIMARY     | NORMAL            | NOARCHIVELOG |
|      1004 | ocp_monitor | USER        | 2025-07-11 12:16:18.727710 | PRIMARY     | NORMAL            | NOARCHIVELOG |
+-----------+-------------+-------------+----------------------------+-------------+-------------------+--------------+

业务集群:

+-----------+--------------+-------------+----------------------------+-------------+-------------------+--------------+
| TENANT_ID | TENANT_NAME  | TENANT_TYPE | CREATE_TIME                | TENANT_ROLE | SWITCHOVER_STATUS | LOG_MODE     |
+-----------+--------------+-------------+----------------------------+-------------+-------------------+--------------+
|         1 | sys          | SYS         | 2025-07-11 16:45:32.475818 | PRIMARY     | NORMAL            | NOARCHIVELOG |
|      1003 | META$1004    | META        | 2025-07-15 14:30:50.778645 | PRIMARY     | NORMAL            | NOARCHIVELOG |
|      1004 | tenant1      | USER        | 2025-07-15 14:30:50.780755 | STANDBY     | NORMAL            | ARCHIVELOG   |
|      1007 | META$1008    | META        | 2025-07-16 16:49:21.709524 | PRIMARY     | NORMAL            | NOARCHIVELOG |
|      1008 | tenant1_back | USER        | 2025-07-16 16:49:21.711633 | PRIMARY     | NORMAL            | ARCHIVELOG   |
+-----------+--------------+-------------+----------------------------+-------------+-------------------+--------------+

都是存在的

2 个赞

集群重启obd cluster restart myocp。看一下obd日志卡在哪里了

1 个赞

Stop ocp-server-ce ok
Start ocp-server-ce ok
[ERROR] failed to start xxxx cp-server-ce
See https://www.oceanbase.com/product/ob-deployer/error-codes .
Trace ID: a0661cdc-62d3-11f0-87b8-86dfe73a3abb
If you want to view detailed obd logs, please run: obd display-trace a0661cdc-62d3-11f0-87b8-86dfe73a3abb

悲剧了

1 个赞

执行obd display-trace a0661cdc-62d3-11f0-87b8-86dfe73a3abb
提供一份详细日志

1 个赞

[2025-07-17 14:03:17.883] [DEBUG] - sub display ref count to 0
[2025-07-17 14:03:17.883] [DEBUG] - export display
[2025-07-17 14:03:17.884] [DEBUG] - plugin obproxy-ce-py_script_display-4.3.0 result: True
[2025-07-17 14:03:17.884] [DEBUG] - Searching stop_pre plugin for components ...
[2025-07-17 14:03:17.884] [DEBUG] - Searching stop_pre plugin for ocp-server-ce-4.3.5-20250319105844.el7-5c670871a262a5c95649ca8e2ad4b237e2a8aa43
[2025-07-17 14:03:17.884] [DEBUG] - Found for ocp-server-ce-py_script_stop_pre-4.2.1 for ocp-server-ce-4.3.5
[2025-07-17 14:03:17.884] [DEBUG] - Call plugin ocp-server-ce-py_script_stop_pre-4.2.1 for ocp-server-ce-4.3.5-20250319105844.el7-5c670871a262a5c95649ca8e2ad4b237e2a8aa43
[2025-07-17 14:03:17.885] [DEBUG] - import stop_pre
[2025-07-17 14:03:17.885] [DEBUG] - add stop_pre ref count to 1
[2025-07-17 14:03:17.886] [DEBUG] - sub stop_pre ref count to 0
[2025-07-17 14:03:17.886] [DEBUG] - export stop_pre
[2025-07-17 14:03:17.886] [DEBUG] - plugin ocp-server-ce-py_script_stop_pre-4.2.1 result: True
[2025-07-17 14:03:17.887] [DEBUG] - Searching stop plugin for components ...
[2025-07-17 14:03:17.887] [DEBUG] - Searching stop plugin for general-4.3.5--None
[2025-07-17 14:03:17.887] [DEBUG] - Found for general-py_script_stop-0.1 for general-4.3.5
[2025-07-17 14:03:17.887] [DEBUG] - Call plugin general-py_script_stop-0.1 for ocp-server-ce-4.3.5-20250319105844.el7-5c670871a262a5c95649ca8e2ad4b237e2a8aa43
[2025-07-17 14:03:17.888] [INFO] Stop ocp-server-ce 
[2025-07-17 14:03:17.889] [DEBUG] -- admin@xx.xxx.xxx.xx execute: cat /data/ocp/ocp/run/ocp-server.pid 
[2025-07-17 14:03:17.893] [DEBUG] -- exited code 0
[2025-07-17 14:03:17.894] [DEBUG] -- admin@xx.xxx.xxx.xx execute: ls /proc/37413 
[2025-07-17 14:03:17.939] [DEBUG] -- exited code 0
[2025-07-17 14:03:17.939] [DEBUG] -- admin@xx.xxx.xxx.xx execute: ls /proc/37413/fd 
[2025-07-17 14:03:17.987] [DEBUG] -- exited code 0
[2025-07-17 14:03:17.987] [DEBUG] -- xx.xxx.xxx.xx ocp-server-ce[pid: 37413] stopping...
[2025-07-17 14:03:17.988] [DEBUG] -- admin@xx.xxx.xxx.xx execute: kill -9 37413 
[2025-07-17 14:03:18.050] [DEBUG] -- exited code 0
[2025-07-17 14:03:19.052] [DEBUG] -- xx.xxx.xxx.xx check whether the port is released
[2025-07-17 14:03:19.052] [DEBUG] -- admin@xx.xxx.xxx.xx execute: bash -c 'cat /proc/net/{tcp*,udp*}' | awk -F' ' '{if($4=="0A") print $2,$4,$10}' | grep ':4705' | awk -F' ' '{print $3}' | uniq 
[2025-07-17 14:03:19.064] [DEBUG] -- exited code 0
[2025-07-17 14:03:19.065] [DEBUG] -- admin@xx.xxx.xxx.xx execute: rm -rf /data/ocp/ocp/run/ocp-server.pid 
[2025-07-17 14:03:19.111] [DEBUG] -- exited code 0
[2025-07-17 14:03:19.111] [DEBUG] -- xx.xxx.xxx.xx ocp-server-ce is stopped
[2025-07-17 14:03:19.193] [DEBUG] - plugin general-py_script_stop-0.1 result: True
[2025-07-17 14:03:19.193] [DEBUG] - Searching parameter_pre plugin for components ...
[2025-07-17 14:03:19.193] [DEBUG] - Searching parameter_pre plugin for ocp-server-ce-4.3.5-20250319105844.el7-5c670871a262a5c95649ca8e2ad4b237e2a8aa43
[2025-07-17 14:03:19.193] [DEBUG] - Found for ocp-server-ce-py_script_parameter_pre-4.2.1 for ocp-server-ce-4.3.5
[2025-07-17 14:03:19.194] [DEBUG] - Call plugin ocp-server-ce-py_script_parameter_pre-4.2.1 for ocp-server-ce-4.3.5-20250319105844.el7-5c670871a262a5c95649ca8e2ad4b237e2a8aa43
[2025-07-17 14:03:19.206] [DEBUG] - plugin ocp-server-ce-py_script_parameter_pre-4.2.1 result: True
[2025-07-17 14:03:19.207] [DEBUG] - Searching ocp_const plugin for components ...
[2025-07-17 14:03:19.207] [DEBUG] - Searching ocp_const plugin for ocp-server-ce-4.3.5-20250319105844.el7-5c670871a262a5c95649ca8e2ad4b237e2a8aa43
[2025-07-17 14:03:19.208] [DEBUG] - Found for ocp-server-ce-py_script_ocp_const-4.2.1 for ocp-server-ce-4.3.5
[2025-07-17 14:03:19.208] [DEBUG] - Call plugin ocp-server-ce-py_script_ocp_const-4.2.1 for ocp-server-ce-4.3.5-20250319105844.el7-5c670871a262a5c95649ca8e2ad4b237e2a8aa43
[2025-07-17 14:03:19.208] [DEBUG] - import ocp_const
[2025-07-17 14:03:19.209] [DEBUG] - add ocp_const ref count to 1
[2025-07-17 14:03:19.209] [DEBUG] - sub ocp_const ref count to 0
[2025-07-17 14:03:19.209] [DEBUG] - export ocp_const
[2025-07-17 14:03:19.210] [DEBUG] - plugin ocp-server-ce-py_script_ocp_const-4.2.1 result: True
[2025-07-17 14:03:19.210] [DEBUG] - Searching start plugin for components ...
[2025-07-17 14:03:19.210] [DEBUG] - Searching start plugin for ocp-server-ce-4.3.5-20250319105844.el7-5c670871a262a5c95649ca8e2ad4b237e2a8aa43
[2025-07-17 14:03:19.210] [DEBUG] - Found for ocp-server-ce-py_script_start-4.2.1 for ocp-server-ce-4.3.5
[2025-07-17 14:03:19.210] [DEBUG] - Call plugin ocp-server-ce-py_script_start-4.2.1 for ocp-server-ce-4.3.5-20250319105844.el7-5c670871a262a5c95649ca8e2ad4b237e2a8aa43
[2025-07-17 14:03:19.211] [DEBUG] - import start
[2025-07-17 14:03:19.214] [DEBUG] - add start ref count to 1
[2025-07-17 14:03:19.214] [DEBUG] -- metadb connect check
[2025-07-17 14:03:19.215] [INFO] Start ocp-server-ce
[2025-07-17 14:03:19.215] [DEBUG] -- admin@xx.xxx.xxx.xx execute: cat /data/ocp/ocp/run/ocp-server.pid 
[2025-07-17 14:03:19.220] [DEBUG] -- exited code 1, error output:
[2025-07-17 14:03:19.220] [DEBUG] cat: /data/ocp/ocp/run/ocp-server.pid: No such file or directory
[2025-07-17 14:03:19.220] [DEBUG] 
[2025-07-17 14:03:19.221] [DEBUG] -- admin@xx.xxx.xxx.xx append '/data/ocp/ocp/jre/bin:' to PATH
[2025-07-17 14:03:19.221] [DEBUG] -- admin@xx.xxx.xxx.xx execute: cd /data/ocp/ocp; export JDBC_URL=jdbc:oceanbase://10.xx.xx.xxx:2883/meta_database; export JDBC_USERNAME=root@ocp_meta;export JDBC_PASSWORD=****** export JDBC_PUBLIC_KEY=;export OCP_INITIAL_ADMIN_PASSWORD=****** 
[2025-07-17 14:03:19.221] [DEBUG] java -Dfile.encoding=UTF-8 -jar -Xms4g -Xmx4g -Docp.iam.encrypted-system-password=****** /data/ocp/ocp/lib/ocp-server.jar --bootstrap > /dev/null 2>&1 & 
[2025-07-17 14:03:19.267] [DEBUG] -- exited code 0
[2025-07-17 14:03:19.267] [DEBUG] -- admin@xx.xxx.xxx.xx execute: ps -aux | grep -F 'java -Dfile.encoding=UTF-8 -jar -Xms4g -Xmx4g -Docp.iam.encrypted-system-password=****** /data/ocp/ocp/lib/ocp-server.jar --bootstrap' | grep -v grep | awk '{print $2}'  
[2025-07-17 14:03:19.323] [DEBUG] -- exited code 0
[2025-07-17 14:03:19.324] [DEBUG] -- write 43895 to admin@xx.xxx.xxx.xx:22: /data/ocp/ocp/run/ocp-server.pid
[2025-07-17 14:03:19.420] [DEBUG] - local execute: rsync -h 
[2025-07-17 14:03:19.429] [DEBUG] - exited code 0
[2025-07-17 14:03:19.430] [DEBUG] - admin@xx.xxx.xxx.xx execute: rsync -h 
[2025-07-17 14:03:19.436] [DEBUG] - exited code 0
[2025-07-17 14:03:19.437] [DEBUG] - current remote_transporter RemoteTransporter.RSYNC
[2025-07-17 14:03:19.437] [DEBUG] -- admin@xx.xxx.xxx.xx execute: mkdir -p /data/ocp/ocp/run 
[2025-07-17 14:03:19.483] [DEBUG] -- exited code 0
[2025-07-17 14:03:19.484] [DEBUG] -- send /tmp/tmpnrx5x1pq to /data/ocp/ocp/run/ocp-server.pid by rsync
[2025-07-17 14:03:19.484] [DEBUG] -- local execute: yes | rsync -a -W -L -e "ssh -o StrictHostKeyChecking=no -p 22" /tmp/tmpnrx5x1pq admin@xx.xxx.xxx.xx:/data/ocp/ocp/run/ocp-server.pid 
[2025-07-17 14:03:19.656] [DEBUG] -- exited code 0
[2025-07-17 14:03:19.738] [DEBUG] - sub start ref count to 0
[2025-07-17 14:03:19.738] [DEBUG] - export start
[2025-07-17 14:03:19.738] [DEBUG] - plugin ocp-server-ce-py_script_start-4.2.1 result: True
[2025-07-17 14:03:19.739] [DEBUG] - Searching health_check plugin for components ...
[2025-07-17 14:03:19.739] [DEBUG] - Searching health_check plugin for ocp-server-ce-4.3.5-20250319105844.el7-5c670871a262a5c95649ca8e2ad4b237e2a8aa43
[2025-07-17 14:03:19.740] [DEBUG] - Found for ocp-server-ce-py_script_health_check-4.2.1 for ocp-server-ce-4.3.5
[2025-07-17 14:03:19.740] [DEBUG] - Call plugin ocp-server-ce-py_script_health_check-4.2.1 for ocp-server-ce-4.3.5-20250319105844.el7-5c670871a262a5c95649ca8e2ad4b237e2a8aa43
[2025-07-17 14:03:19.740] [DEBUG] - import health_check
[2025-07-17 14:03:19.742] [DEBUG] - add health_check ref count to 1
[2025-07-17 14:03:19.743] [INFO] ocp-server-ce program health check
[2025-07-17 14:03:19.744] [DEBUG] -- xx.xxx.xxx.xx program health check
[2025-07-17 14:03:19.744] [DEBUG] -- admin@xx.xxx.xxx.xx execute: ls /proc/43895 
[2025-07-17 14:03:19.749] [DEBUG] -- exited code 0
[2025-07-17 14:03:19.750] [DEBUG] -- admin@xx.xxx.xxx.xx execute: bash -c 'cat /proc/net/{tcp*,udp*}' | awk -F' ' '{print $2,$10}' | grep '00000000:4705' | awk -F' ' '{print $2}' | uniq 
[2025-07-17 14:03:19.884] [DEBUG] -- exited code 0
[2025-07-17 14:03:19.884] [DEBUG] -- failed to start xx.xxx.xxx.xx ocp-server-ce, remaining retries: 119
[2025-07-17 14:03:34.899] [DEBUG] -- xx.xxx.xxx.xx program health check
[2025-07-17 14:03:34.899] [DEBUG] -- admin@xx.xxx.xxx.xx execute: ls /proc/43895 
[2025-07-17 14:03:34.905] [DEBUG] -- exited code 0
[2025-07-17 14:03:34.905] [DEBUG] -- admin@xx.xxx.xxx.xx execute: bash -c 'cat /proc/net/{tcp*,udp*}' | awk -F' ' '{print $2,$10}' | grep '00000000:4705' | awk -F' ' '{print $2}' | uniq 
[2025-07-17 14:03:34.962] [DEBUG] -- exited code 0
[2025-07-17 14:03:34.963] [DEBUG] -- failed to start xx.xxx.xxx.xx ocp-server-ce, remaining retries: 118
[2025-07-17 14:03:49.978] [DEBUG] -- xx.xxx.xxx.xx program health check
[2025-07-17 14:03:49.978] [DEBUG] -- admin@xx.xxx.xxx.xx execute: ls /proc/43895 
[2025-07-17 14:03:49.984] [DEBUG] -- exited code 0
[2025-07-17 14:03:49.984] [DEBUG] -- admin@xx.xxx.xxx.xx execute: bash -c 'cat /proc/net/{tcp*,udp*}' | awk -F' ' '{print $2,$10}' | grep '00000000:4705' | awk -F' ' '{print $2}' | uniq 
[2025-07-17 14:03:50.041] [DEBUG] -- exited code 0
[2025-07-17 14:03:50.041] [DEBUG] -- failed to start xx.xxx.xxx.xx ocp-server-ce, remaining retries: 117
[2025-07-17 14:04:05.056] [DEBUG] -- xx.xxx.xxx.xx program health check
[2025-07-17 14:04:05.056] [DEBUG] -- admin@xx.xxx.xxx.xx execute: ls /proc/43895 
[2025-07-17 14:04:05.061] [DEBUG] -- exited code 0
[2025-07-17 14:04:05.062] [DEBUG] -- admin@xx.xxx.xxx.xx execute: bash -c 'cat /proc/net/{tcp*,udp*}' | awk -F' ' '{print $2,$10}' | grep '00000000:4705' | awk -F' ' '{print $2}' | uniq 
[2025-07-17 14:04:05.118] [DEBUG] -- exited code 0
[2025-07-17 14:04:05.118] [DEBUG] -- failed to start xx.xxx.xxx.xx ocp-server-ce, remaining retries: 116
[2025-07-17 14:04:20.135] [DEBUG] -- xx.xxx.xxx.xx program health check
[2025-07-17 14:04:20.135] [DEBUG] -- admin@xx.xxx.xxx.xx execute: ls /proc/43895 
[2025-07-17 14:04:20.140] [DEBUG] -- exited code 0
[2025-07-17 14:04:20.141] [DEBUG] -- admin@xx.xxx.xxx.xx execute: bash -c 'cat /proc/net/{tcp*,udp*}' | awk -F' ' '{print $2,$10}' | grep '00000000:4705' | awk -F' ' '{print $2}' | uniq 
[2025-07-17 14:04:20.194] [DEBUG] -- exited code 0
[2025-07-17 14:04:20.194] [DEBUG] -- failed to start xx.xxx.xxx.xx ocp-server-ce, remaining retries: 115
[2025-07-17 14:04:35.209] [DEBUG] -- xx.xxx.xxx.xx program health check
[2025-07-17 14:04:35.210] [DEBUG] -- admin@xx.xxx.xxx.xx execute: ls /proc/43895 
[2025-07-17 14:04:35.215] [DEBUG] -- exited code 0
[2025-07-17 14:04:35.216] [DEBUG] -- admin@xx.xxx.xxx.xx execute: bash -c 'cat /proc/net/{tcp*,udp*}' | awk -F' ' '{print $2,$10}' | grep '00000000:4705' | awk -F' ' '{print $2}' | uniq 
[2025-07-17 14:04:35.269] [DEBUG] -- exited code 0
[2025-07-17 14:04:35.269] [DEBUG] -- failed to start xx.xxx.xxx.xx ocp-server-ce, remaining retries: 114
[2025-07-17 14:04:50.284] [DEBUG] -- xx.xxx.xxx.xx program health check
[2025-07-17 14:04:50.285] [DEBUG] -- admin@xx.xxx.xxx.xx execute: ls /proc/43895 
[2025-07-17 14:04:50.289] [DEBUG] -- exited code 0
[2025-07-17 14:04:50.290] [DEBUG] -- admin@xx.xxx.xxx.xx execute: bash -c 'cat /proc/net/{tcp*,udp*}' | awk -F' ' '{print $2,$10}' | grep '00000000:4705' | awk -F' ' '{print $2}' | uniq 
[2025-07-17 14:04:50.343] [DEBUG] -- exited code 0
[2025-07-17 14:04:50.343] [DEBUG] -- failed to start xx.xxx.xxx.xx ocp-server-ce, remaining retries: 113
[2025-07-17 14:05:05.359] [DEBUG] -- xx.xxx.xxx.xx program health check
[2025-07-17 14:05:05.359] [DEBUG] -- admin@xx.xxx.xxx.xx execute: ls /proc/43895 
[2025-07-17 14:05:05.364] [DEBUG] -- exited code 0
[2025-07-17 14:05:05.364] [DEBUG] -- admin@xx.xxx.xxx.xx execute: bash -c 'cat /proc/net/{tcp*,udp*}' | awk -F' ' '{print $2,$10}' | grep '00000000:4705' | awk -F' ' '{print $2}' | uniq 
[2025-07-17 14:05:05.416] [DEBUG] -- exited code 0
[2025-07-17 14:05:05.416] [DEBUG] -- failed to start xx.xxx.xxx.xx ocp-server-ce, remaining retries: 112
[2025-07-17 14:05:20.427] [DEBUG] -- xx.xxx.xxx.xx program health check
[2025-07-17 14:05:20.427] [DEBUG] -- admin@xx.xxx.xxx.xx execute: ls /proc/43895 
[2025-07-17 14:05:20.432] [DEBUG] -- exited code 2, error output:
[2025-07-17 14:05:20.432] [DEBUG] ls: cannot access '/proc/43895': No such file or directory
[2025-07-17 14:05:20.432] [DEBUG] 
[2025-07-17 14:05:20.519] [ERROR] failed to start xx.xxx.xxx.xx ocp-server-ce
[2025-07-17 14:05:20.519] [DEBUG] - sub health_check ref count to 0
[2025-07-17 14:05:20.519] [DEBUG] - export health_check
[2025-07-17 14:05:20.519] [DEBUG] - plugin ocp-server-ce-py_script_health_check-4.2.1 result: False
[2025-07-17 14:05:20.527] [DEBUG] - share lock /home/admin/.obd/lock/mirror_and_repo release, count 5
[2025-07-17 14:05:20.528] [DEBUG] - share lock /home/admin/.obd/lock/mirror_and_repo release, count 4
[2025-07-17 14:05:20.528] [DEBUG] - share lock /home/admin/.obd/lock/mirror_and_repo release, count 3
[2025-07-17 14:05:20.528] [DEBUG] - share lock /home/admin/.obd/lock/mirror_and_repo release, count 2
[2025-07-17 14:05:20.528] [DEBUG] - share lock /home/admin/.obd/lock/mirror_and_repo release, count 1
[2025-07-17 14:05:20.528] [DEBUG] - share lock /home/admin/.obd/lock/mirror_and_repo release, count 0
[2025-07-17 14:05:20.528] [DEBUG] - unlock /home/admin/.obd/lock/mirror_and_repo
[2025-07-17 14:05:20.528] [DEBUG] - exclusive lock /home/admin/.obd/lock/deploy_myocp release, count 0
[2025-07-17 14:05:20.529] [DEBUG] - unlock /home/admin/.obd/lock/deploy_myocp
[2025-07-17 14:05:20.529] [DEBUG] - share lock /home/admin/.obd/lock/global release, count 0
[2025-07-17 14:05:20.529] [DEBUG] - unlock /home/admin/.obd/lock/global
[2025-07-17 14:05:20.529] [INFO] See https://www.oceanbase.com/product/ob-deployer/error-codes .
[2025-07-17 14:05:20.529] [INFO] Trace ID: a0661cdc-62d3-11f0-87b8-86dfe73a3abb
[2025-07-17 14:05:20.529] [INFO] If you want to view detailed obd logs, please run: obd display-trace a0661cdc-62d3-11f0-87b8-86dfe73a3abb
admin@ubuntu:~$ cat /data/ocp/ocp/run/ocp-server.pid 
43895
admin@ubuntu:~$ 
admin@ubuntu:~$ 
admin@ubuntu:~$ ps -ef | grep java
admin     7674 18314  0 14:38 pts/0    00:00:00 grep --color=auto java
admin@ubuntu:~$ 
admin@ubuntu:~$ 
1 个赞
[2025-07-17 14:03:19.484] [DEBUG] -- send /tmp/tmpnrx5x1pq to /data/ocp/ocp/run/ocp-server.pid by rsync
[2025-07-17 14:03:19.484] [DEBUG] -- local execute: yes | rsync -a -W -L -e "ssh -o StrictHostKeyChecking=no -p 22" /tmp/tmpnrx5x1pq admin@xx.xxx.xxx.xx:/data/ocp/ocp/run/ocp-server.pid 
[2025-07-17 14:03:19.656] [DEBUG] -- exited code 0

这个代码是什么意思?必须按pid=43895 来启动 ocp吗?

/tmp/tmpnrx5x1pq admin@xx.xxx.xxx.xx:/data/ocp/ocp/run/ocp-server.pid

为什么要rsync呢?

1 个赞

是启动ocp服务失败提供一份ocp-server日志看看。或者升级先obd,目前已发布obd 3.4.0版本。

1 个赞
2025-07-17 20:46:17.313  INFO 30180 --- [sharding-metric-collect1,,] c.o.o.c.sharding.registry.ShardingTask   : Mark consumer state, newState=PAUSE
2025-07-17 20:46:17.317  INFO 30180 --- [main,,] c.o.o.b.i.task.BackupOcpTaskManager      : register backup schedules success
2025-07-17 20:46:17.423  WARN 30180 --- [main,,] ConfigServletWebServerApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'analyzeController': Unsatisfied dependency expressed through method 'setLogQuerier' parameter 0; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'logQuerier' defined in class path resource [com/oceanbase/ocp/analyze/configuration/AnalyzeConfiguration.class]: Unsatisfied dependency expressed through method 'logQuerier' parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'restHighLevelClient' defined in class path resource [com/oceanbase/ocp/analyze/configuration/AnalyzeConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.opensearch.client.RestHighLevelClient]: Factory method 'restHighLevelClient' threw exception; nested exception is java.lang.IllegalArgumentException: hosts must not be null nor empty
2025-07-17 20:46:17.433  INFO 30180 --- [main,,] c.o.ocp.monitor.OcpMonitorManager        : monitor manager destroy...
2025-07-17 20:46:17.435  INFO 30180 --- [main,,] c.o.ocp.monitor.OcpMonitorManager        : monitor manager destroyed
2025-07-17 20:46:17.464  INFO 30180 --- [main,,] s.i.c.SqlAuditRawStatRollupDaemonService : SQL stat rollup daemon stopped
2025-07-17 20:46:17.484  INFO 30180 --- [main,,] c.o.o.s.i.r.s.RateLimitFacadeServiceImpl : Destroy RateLimitFacadeService executor...
2025-07-17 20:46:17.484  INFO 30180 --- [main,,] c.o.o.s.i.r.s.RateLimitFacadeServiceImpl : RateLimitFacadeService executor destroyed
2025-07-17 20:46:17.552  INFO 30180 --- [main,,] c.o.o.alarm.utils.task.AlarmTaskManager  : exited done
2025-07-17 20:46:17.564  INFO 30180 --- [main,,] c.o.ocp.core.sharding.ShardingFactory    : Shutdown sharding factory, coordinatorSize=3, consumerSize=3
2025-07-17 20:46:20.565  WARN 30180 --- [main,,] c.oceanbase.ocp.core.util.ExecutorUtils  : terminate failed, forcing shutdown...
2025-07-17 20:46:20.569  INFO 30180 --- [sharding-metric-collect1,,] c.o.o.c.sharding.registry.ShardingTask   : Sharding failed.

java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at com.oceanbase.ocp.core.sharding.registry.ShardingTask.sharding(ShardingTask.java:66)
        at com.oceanbase.ocp.core.sharding.ShardingCoordinator.doSharding(ShardingCoordinator.java:114)
        at com.oceanbase.ocp.core.sharding.ShardingCoordinator.sharding(ShardingCoordinator.java:84)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

2025-07-17 20:46:20.574  INFO 30180 --- [main,,] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
2025-07-17 20:46:20.578  INFO 30180 --- [main,,] com.zaxxer.hikari.HikariDataSource       : monitor-connect-pool - Shutdown initiated...
2025-07-17 20:46:20.585  INFO 30180 --- [main,,] com.zaxxer.hikari.HikariDataSource       : monitor-connect-pool - Shutdown completed.
2025-07-17 20:46:20.588  INFO 30180 --- [main,,] com.alibaba.druid.pool.DruidDataSource   : {dataSource-0} closing ...
2025-07-17 20:46:20.594  INFO 30180 --- [main,,] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
2025-07-17 20:46:20.595  INFO 30180 --- [main,,] com.zaxxer.hikari.HikariDataSource       : metadb-connect-pool - Shutdown initiated...
2025-07-17 20:46:20.601  INFO 30180 --- [main,,] com.zaxxer.hikari.HikariDataSource       : metadb-connect-pool - Shutdown completed.
2025-07-17 20:46:20.604  INFO 30180 --- [main,,] o.apache.catalina.core.StandardService   : Stopping service [Tomcat]
2025-07-17 20:46:20.617  INFO 30180 --- [ocp-updater-0,,] c.o.o.c.d.event.DistributedEventManager  : [OCP116802] DistributedEvent Receive on 03c6eb0446 exception Failed to obtain JDBC Connection; nested exception is java.sql.SQLException: HikariDataSource HikariDataSource (metadb-connect-pool) has been closed.
2025-07-17 20:46:20.635  INFO 30180 --- [main,,] ConditionEvaluationReportLoggingListener : 

Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled.
2025-07-17 20:46:20.637  INFO 30180 --- [main,,] c.o.o.b.spring.BootstrapRunListener      : failed
2025-07-17 20:46:20.755  WARN 30180 --- [main,,] c.o.o.s.c.analyzer.OcpFailureAnalyzer    : OCP startup check failed with cause: 

org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'analyzeController': Unsatisfied dependency expressed through method 'setLogQuerier' parameter 0; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'logQuerier' defined in class path resource [com/oceanbase/ocp/analyze/configuration/AnalyzeConfiguration.class]: Unsatisfied dependency expressed through method 'logQuerier' parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'restHighLevelClient' defined in class path resource [com/oceanbase/ocp/analyze/configuration/AnalyzeConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.opensearch.client.RestHighLevelClient]: Factory method 'restHighLevelClient' threw exception; nested exception is java.lang.IllegalArgumentException: hosts must not be null nor empty
        at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredMethodElement.resolveMethodArguments(AutowiredAnnotationBeanPostProcessor.java:824)
        at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredMethodElement.inject(AutowiredAnnotationBeanPostProcessor.java:777)
        at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:119)
        at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:408)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1431)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:619)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
        at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:955)
        at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:929)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:591)
        at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:147)
        at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:732)
        at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:409)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:308)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1300)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1289)
        at com.oceanbase.ocp.OcpCeServerApplication.main(OcpCeServerApplication.java:21)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
        at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
        at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
        at org.springframework.boot.loader.PropertiesLauncher.main(PropertiesLauncher.java:467)
Caused by: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'logQuerier' defined in class path resource [com/oceanbase/ocp/analyze/configuration/AnalyzeConfiguration.class]: Unsatisfied dependency expressed through method 'logQuerier' parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'restHighLevelClient' defined in class path resource [com/oceanbase/ocp/analyze/configuration/AnalyzeConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.opensearch.client.RestHighLevelClient]: Factory method 'restHighLevelClient' threw exception; nested exception is java.lang.IllegalArgumentException: hosts must not be null nor empty
        at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:801)
        at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:536)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1352)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1195)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:582)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
        at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
        at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1391)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1311)
        at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredMethodElement.resolveMethodArguments(AutowiredAnnotationBeanPostProcessor.java:816)
        ... 28 common frames omitted
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'restHighLevelClient' defined in class path resource [com/oceanbase/ocp/analyze/configuration/AnalyzeConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.opensearch.client.RestHighLevelClient]: Factory method 'restHighLevelClient' threw exception; nested exception is java.lang.IllegalArgumentException: hosts must not be null nor empty
        at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:653)
        at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:481)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1352)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1195)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:582)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
        at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
        at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1391)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1311)
        at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:911)
        at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:788)
        ... 41 common frames omitted
Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.opensearch.client.RestHighLevelClient]: Factory method 'restHighLevelClient' threw exception; nested exception is java.lang.IllegalArgumentException: hosts must not be null nor empty
        at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:185)
        at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:648)
        ... 55 common frames omitted
Caused by: java.lang.IllegalArgumentException: hosts must not be null nor empty
        at org.opensearch.client.RestClient.builder(RestClient.java:220)
        at com.oceanbase.ocp.analyze.internal.basic.es.ClientBuilder.buildRestHighLevelClient(ClientBuilder.java:47)
        at com.oceanbase.ocp.analyze.configuration.AnalyzeConfiguration.restHighLevelClient(AnalyzeConfiguration.java:75)
        at com.oceanbase.ocp.analyze.configuration.AnalyzeConfiguration$$EnhancerBySpringCGLIB$$34474c86.CGLIB$restHighLevelClient$5(<generated>)
        at com.oceanbase.ocp.analyze.configuration.AnalyzeConfiguration$$EnhancerBySpringCGLIB$$34474c86$$FastClassBySpringCGLIB$$e505d928.invoke(<generated>)
        at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:244)
        at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:331)
        at com.oceanbase.ocp.analyze.configuration.AnalyzeConfiguration$$EnhancerBySpringCGLIB$$34474c86.restHighLevelClient(<generated>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154)
        ... 56 common frames omitted

2025-07-17 20:46:20.759 ERROR 30180 --- [main,,] o.s.b.d.LoggingFailureAnalysisReporter   : 

***************************
APPLICATION FAILED TO START
***************************

Description:

OCP application startup check failed.

Action:

Please check the stack trace above for the root cause.


```

日志不好分析呀

[org.opensearch.client.RestHighLevelClient]: Factory method 'restHighLevelClient' threw exception; nested exception is java.lang.IllegalArgumentExcep
tion: hosts must not be null nor empty

知道原因了
ocp.analyze.enabled
ocp.analyze.ob.trace.enabled
这两个参数设置成true,但是没有设置es的地址 导致无法启动。