【 使用环境 】生产环境
【 OB or 其他组件 】
【 使用版本 】 4.0
【问题描述】
@论坛小助手
OceanBase 启动一直卡在
Get local repositories ok
Search plugins ok
Load cluster param plugin ok
Open ssh connection ok
Check before start observer ok
Check before start obproxy ok
[WARN] OBD-4521: The config observer_sys_password in obproxy-ce did not take effect, please config it in oceanbase-ce
Check before start obagent ok
Check before start ocp-express ok
Start observer ok
observer program health check ok
obshell program health check ok
Connect to observer 10.106.12.18:2881 ok
Start obproxy ok
obproxy program health check ok
Connect to obproxy ok
Initialize obproxy-ce ok
Start obagent ok
obagent program health check ok
Connect to Obagent ok
Start ocp-express |
Start ocp-express /
Start ocp-express /
当前位置,ocp日志中提示,一般是server节点还没初始化完毕等待半小时后,会短暂修复,
但修复执行后还会出现崩溃:
当前应该如何处理这个问题?我尝试了社区中的方法,对这个异常都没有效果。
当前现象,还会引发其中一个节点不断重启,伴随着一些并发问题。
下面是后续出现的日志:
Start ocp-express x
[ERROR] XXXX. failed to connect meta db
[ERROR] ocp-express start failed
observer need bootstarp x
±------------------------------------------------+
| obproxy |
±--------------±-----±----------------±-------+
| ip | port | prometheus_port | status |
±--------------±-----±----------------±-------+
| XXXX. | 2883 | 2884 | active |
| XXXX. | 2883 | 2884 | active |
±--------------±-----±----------------±-------+
obclient -hXXXX.18 -P2883 -uroot -p’ljgsmrlab’ -Doceanbase -A
±-------------------------------------------------------------------+
| obagent |
±--------------±-------------------±-------------------±---------+
| ip | mgragent_http_port | monagent_http_port | status |
±--------------±-------------------±-------------------±---------+
| XXXX. | 8089 | 8088 | active |
| XXXX. | 8089 | 8088 | inactive |
| XXXX. | 8089 | 8088 | active |
±--------------±-------------------±-------------------±---------+
See https://www.oceanbase.com/product/ob-deployer/error-codes .
Trace ID: f48313d2-6cf2-11f0-8fec-fa163e596018
If you want to view detailed obd logs, please run: obd display-trace f48313d2-6cf2-11f0-8fec-fa163e596018
[2025-07-30 11:42:59.754] [ERROR] XXXX.18: failed to connect meta db
[2025-07-30 11:42:59.852] [INFO] [ERROR] XXXX.18: failed to connect meta db
[2025-07-30 11:42:59.852] [INFO]
[2025-07-30 11:42:59.852] [DEBUG] - sub start ref count to 0
[2025-07-30 11:42:59.852] [DEBUG] - export start
[2025-07-30 11:42:59.852] [ERROR] ocp-express start failed
[2025-07-30 11:42:59.853] [DEBUG] - Call oceanbase-ce-py_script_display-3.1.0 for oceanbase-ce-4.2.2.0-100000192024011915.el7-aa3053da7370a6685a2ef457cd202d50e5ab75d3
[2025-07-30 11:42:59.853] [DEBUG] - import display
[2025-07-30 11:42:59.854] [DEBUG] - add display ref count to 1
[2025-07-30 11:42:59.854] [INFO] Wait for observer init
[2025-07-30 11:42:59.855] [DEBUG] – execute sql: select * from oceanbase.__all_server. args: None
[2025-07-30 11:42:59.855] [DEBUG] – OBD-5000: select * from oceanbase.__all_server execute failed
[2025-07-30 11:42:59.857] [ERROR] Traceback (most recent call last):
[2025-07-30 11:42:59.857] [ERROR] File “core.py”, line 2018, in start_cluster
[2025-07-30 11:42:59.857] [ERROR] File “core.py”, line 2142, in _start_cluster
[2025-07-30 11:42:59.857] [ERROR] File “core.py”, line 186, in call_plugin
[2025-07-30 11:42:59.857] [ERROR] File “_plugin.py”, line 346, in call
[2025-07-30 11:42:59.857] [ERROR] File “_plugin.py”, line 304, in _new_func
[2025-07-30 11:42:59.857] [ERROR] File “/root/.obd/plugins/oceanbase-ce/3.1.0/display.py”, line 37, in display
[2025-07-30 11:42:59.857] [ERROR] servers = cursor.fetchall(‘select * from oceanbase.__all_server’, raise_exception=True, exc_level=‘verbose’)
[2025-07-30 11:42:59.858] [ERROR] File “_stdio.py”, line 886, in func_wrapper
[2025-07-30 11:42:59.858] [ERROR] File “/root/.obd/plugins/oceanbase-ce/4.2.2.0/connect.py”, line 511, in fetchall
[2025-07-30 11:42:59.858] [ERROR] return self.execute(sql, args=args, execute_func=‘fetchall’, raise_exception=raise_exception, exc_level=exc_level, stdio=stdio)
[2025-07-30 11:42:59.858] [ERROR] File “_stdio.py”, line 886, in func_wrapper
[2025-07-30 11:42:59.858] [ERROR] File “/root/.obd/plugins/oceanbase-ce/4.2.2.0/connect.py”, line 490, in execute
[2025-07-30 11:42:59.858] [ERROR] self.cursor.execute(sql, args)
[2025-07-30 11:42:59.858] [ERROR] File “pymysql/cursors.py”, line 148, in execute
[2025-07-30 11:42:59.858] [ERROR] File “pymysql/cursors.py”, line 310, in _query
[2025-07-30 11:42:59.858] [ERROR] File “pymysql/connections.py”, line 548, in query
[2025-07-30 11:42:59.858] [ERROR] File “pymysql/connections.py”, line 775, in _read_query_result
[2025-07-30 11:42:59.858] [ERROR] File “pymysql/connections.py”, line 1156, in read
[2025-07-30 11:42:59.858] [ERROR] File “pymysql/connections.py”, line 692, in _read_packet
[2025-07-30 11:42:59.858] [ERROR] File “pymysql/connections.py”, line 748, in _read_bytes
[2025-07-30 11:42:59.858] [ERROR] pymysql.err.OperationalError: (2013, ‘Lost connection to MySQL server during query’)
[2025-07-30 11:42:59.858] [ERROR]
[2025-07-30 11:42:59.985] [DEBUG] - sub display ref count to 0
[2025-07-30 11:42:59.985] [DEBUG] - export display
[2025-07-30 11:42:59.985] [DEBUG] - Call obproxy-ce-py_script_display-3.1.0 for obproxy-ce-4.2.1.0-11.el7-0aed4b782120e4248b749f67be3d2cc82cdcb70d
[2025-07-30 11:42:59.985] [DEBUG] - import display
[2025-07-30 11:42:59.987] [DEBUG] - add display ref count to 1
[2025-07-30 11:42:59.987] [DEBUG] – execute sql: show proxyconfig like “%port”. args: None
[2025-07-30 11:42:59.999] [DEBUG] – execute sql: show proxyconfig like “%port”. args: None
[2025-07-30 11:43:00.000] [DEBUG] – OBD-5000: show proxyconfig like “%port” execute failed
[2025-07-30 11:43:00.000] [DEBUG] – execute sql: show proxyconfig like “%port”. args: None
[2025-07-30 11:43:00.011] [INFO] ±------------------------------------------------+
[2025-07-30 11:43:00.011] [INFO] | obproxy |
[2025-07-30 11:43:00.011] [INFO] ±--------------±-----±----------------±-------+
[2025-07-30 11:43:00.011] [INFO] | ip | port | prometheus_port | status |
[2025-07-30 11:43:00.011] [INFO] ±--------------±-----±----------------±-------+
[2025-07-30 11:43:00.011] [INFO] | XXXX.18 | 2883 | 2884 | active |
[2025-07-30 11:43:00.011] [INFO] | XXXX.249 | 2883 | 2884 | active |
[2025-07-30 11:43:00.012] [INFO] ±--------------±-----±----------------±-------+
[2025-07-30 11:43:00.013] [INFO] obclient -hXXXX.18 -P2883 -uroot -p’ljgsmrlab’ -Doceanbase -A
[2025-07-30 11:43:00.013] [INFO]
[2025-07-30 11:43:00.013] [DEBUG] - sub display ref count to 0
[2025-07-30 11:43:00.013] [DEBUG] - export display
[2025-07-30 11:43:00.013] [DEBUG] - Call obagent-py_script_display-1.3.0 for obagent-4.2.2-100000042024011120.el7-19739a07a12eab736aff86ecf357b1ae660b554e
[2025-07-30 11:43:00.013] [DEBUG] - import display
[2025-07-30 11:43:00.014] [DEBUG] - add display ref count to 1
[2025-07-30 11:43:00.014] [DEBUG] – send http request method: GET, url: http://XXXX.18:8089/api/v1/agent/status, data: None
[2025-07-30 11:43:00.113] [DEBUG] – send http request method: GET, url: http://XXXX.71:8089/api/v1/agent/status, data: None
[2025-07-30 11:43:00.117] [ERROR] Traceback (most recent call last):
[2025-07-30 11:43:00.117] [ERROR] File “urllib3/connection.py”, line 174, in _new_conn
[2025-07-30 11:43:00.117] [ERROR] File “urllib3/util/connection.py”, line 95, in create_connection
[2025-07-30 11:43:00.117] [ERROR] File “urllib3/util/connection.py”, line 85, in create_connection
[2025-07-30 11:43:00.117] [ERROR] ConnectionRefusedError: [Errno 111] Connection refused
[2025-07-30 11:43:00.117] [ERROR]
[2025-07-30 11:43:00.117] [ERROR] During handling of the above exception, another exception occurred:
[2025-07-30 11:43:00.117] [ERROR]
[2025-07-30 11:43:00.117] [ERROR] Traceback (most recent call last):
[2025-07-30 11:43:00.117] [ERROR] File “urllib3/connectionpool.py”, line 715, in urlopen
[2025-07-30 11:43:00.117] [ERROR] File “urllib3/connectionpool.py”, line 416, in _make_request
[2025-07-30 11:43:00.117] [ERROR] File “urllib3/connection.py”, line 244, in request
[2025-07-30 11:43:00.117] [ERROR] File “http/client.py”, line 1256, in request
[2025-07-30 11:43:00.117] [ERROR] File “http/client.py”, line 1302, in _send_request
[2025-07-30 11:43:00.117] [ERROR] File “http/client.py”, line 1251, in endheaders
[2025-07-30 11:43:00.117] [ERROR] File “http/client.py”, line 1011, in _send_output
[2025-07-30 11:43:00.118] [ERROR] File “http/client.py”, line 951, in send
[2025-07-30 11:43:00.118] [ERROR] File “urllib3/connection.py”, line 205, in connect
[2025-07-30 11:43:00.118] [ERROR] File “urllib3/connection.py”, line 186, in _new_conn
[2025-07-30 11:43:00.118] [ERROR] urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fc446742310>: Failed to establish a new connection: [Errno 111] Connection refused
[2025-07-30 11:43:00.118] [ERROR]
[2025-07-30 11:43:00.118] [ERROR] During handling of the above exception, another exception occurred:
[2025-07-30 11:43:00.118] [ERROR]
[2025-07-30 11:43:00.118] [ERROR] Traceback (most recent call last):
[2025-07-30 11:43:00.118] [ERROR] File “requests/adapters.py”, line 439, in send
[2025-07-30 11:43:00.118] [ERROR] File “urllib3/connectionpool.py”, line 799, in urlopen
[2025-07-30 11:43:00.118] [ERROR] File “urllib3/util/retry.py”, line 592, in increment
[2025-07-30 11:43:00.118] [ERROR] urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host=‘XXXX.71’, port=8089): Max retries exceeded with url: /api/v1/agent/status (Caused by NewConnectionError(’<urllib3.connection.HTTPConnection object at 0x7fc446742310>: Failed to establish a new connection: [Errno 111] Connection refused’))
[2025-07-30 11:43:00.118] [ERROR]
[2025-07-30 11:43:00.118] [ERROR] During handling of the above exception, another exception occurred:
[2025-07-30 11:43:00.118] [ERROR]
[2025-07-30 11:43:00.118] [ERROR] Traceback (most recent call last):
[2025-07-30 11:43:00.118] [ERROR] File “core.py”, line 2018, in start_cluster
[2025-07-30 11:43:00.118] [ERROR] File “core.py”, line 2142, in _start_cluster
[2025-07-30 11:43:00.118] [ERROR] File “core.py”, line 186, in call_plugin
[2025-07-30 11:43:00.118] [ERROR] File “_plugin.py”, line 346, in call
[2025-07-30 11:43:00.118] [ERROR] File “_plugin.py”, line 304, in _new_func
[2025-07-30 11:43:00.119] [ERROR] File “/root/.obd/plugins/obagent/1.3.0/display.py”, line 39, in display
[2025-07-30 11:43:00.119] [ERROR] ‘status’: ‘active’ if api_cursor and api_cursor.connect(stdio) else ‘inactive’,
[2025-07-30 11:43:00.119] [ERROR] File “/root/.obd/plugins/obagent/1.3.0/connect.py”, line 47, in connect
[2025-07-30 11:43:00.119] [ERROR] return self._request(‘GET’, ‘api/v1/agent/status’, stdio=stdio)
[2025-07-30 11:43:00.119] [ERROR] File “/root/.obd/plugins/obagent/1.3.0/connect.py”, line 58, in _request
[2025-07-30 11:43:00.119] [ERROR] resp = requests.request(method, url, auth=self.auth, data=data, verify=False)
[2025-07-30 11:43:00.119] [ERROR] File “requests/api.py”, line 61, in request
[2025-07-30 11:43:00.119] [ERROR] File “requests/sessions.py”, line 542, in request
[2025-07-30 11:43:00.119] [ERROR] File “requests/sessions.py”, line 655, in send
[2025-07-30 11:43:00.119] [ERROR] File “requests/adapters.py”, line 516, in send
[2025-07-30 11:43:00.119] [ERROR] requests.exceptions.ConnectionError: HTTPConnectionPool(host=‘XXXX.71’, port=8089): Max retries exceeded with url: /api/v1/agent/status (Caused by NewConnectionError(’<urllib3.connection.HTTPConnection object at 0x7fc446742310>: Failed to establish a new connection: [Errno 111] Connection refused’))
[2025-07-30 11:43:00.119] [ERROR]
[2025-07-30 11:43:00.119] [DEBUG] – request obagent failed: HTTPConnectionPool(host=‘XXXX.71’, port=8089): Max retries exceeded with url: /api/v1/agent/status (Caused by NewConnectionError(’<urllib3.connection.HTTPConnection object at 0x7fc446742310>: Failed to establish a new connection: [Errno 111] Connection refused’))
[2025-07-30 11:43:00.119] [DEBUG] – send http request method: GET, url: http://XXXX.249:8089/api/v1/agent/status, data: None
[2025-07-30 11:43:00.222] [INFO] ±-------------------------------------------------------------------+
[2025-07-30 11:43:00.222] [INFO] | obagent |
[2025-07-30 11:43:00.222] [INFO] ±--------------±-------------------±-------------------±---------+
[2025-07-30 11:43:00.223] [INFO] | ip | mgragent_http_port | monagent_http_port | status |
[2025-07-30 11:43:00.223] [INFO] ±--------------±-------------------±-------------------±---------+
[2025-07-30 11:43:00.223] [INFO] | XXXX.18 | 8089 | 8088 | active |
[2025-07-30 11:43:00.223] [INFO] | XXXX.71 | 8089 | 8088 | inactive |
[2025-07-30 11:43:00.223] [INFO] | XXXX.249 | 8089 | 8088 | active |
[2025-07-30 11:43:00.223] [INFO] ±--------------±-------------------±-------------------±---------+
[2025-07-30 11:43:00.223] [DEBUG] - sub display ref count to 0
[2025-07-30 11:43:00.223] [DEBUG] - export display
[2025-07-30 11:43:00.233] [INFO] See https://www.oceanbase.com/product/ob-deployer/error-codes .
[2025-07-30 11:43:00.234] [INFO] Trace ID: f48313d2-6cf2-11f0-8fec-fa163e596018
[2025-07-30 11:43:00.234] [INFO] If you want to view detailed obd logs, please run: obd display-trace f48313d2-6cf2-11f0-8fec-fa163e596018
[2025-07-30 11:43:00.235] [DEBUG] - unlock /root/.obd/lock/global
[2025-07-30 11:43:00.235] [DEBUG] - unlock /root/.obd/lock/deploy_lj5grim
[2025-07-30 11:43:00.235] [DEBUG] - unlock /root/.obd/lock/mirror_and_repo
还看到有关于clog相关:
[2025-07-30 15:02:15.890997] ERROR try_recycle_blocks (palf_env_impl.cpp:784) [6691][T1001_PalfGC][T1001][Y0-0000000000000000-0-0] [lt=30][errcode=-4264] Log out of disk space(msg=“log disk space is almost full”, ret=-4264, total_size(MB)=614, used_size(MB)=583, used_percent(%)=95, warn_size(MB)=491, warn_percent(%)=80, limit_size(MB)=583, limit_percent(%)=95, total_unrecyclable_size_byte(MB)=519, maximum_used_size(MB)=583, maximum_log_stream=1, oldest_log_stream=1, oldest_scn={val:1750291890447470001, v:0}, in_shrinking=false)
【备注】基于 LLM 和开源文档 RAG 的论坛小助手已开放测试,在发帖时输入 [@论坛小助手] 即可召唤小助手,欢迎试用!