集群无法启动-生产环境

【 使用环境 】生产环境
【 OB or 其他组件 】observer
【 使用版本 】4.2.19
【问题描述】使用过程中三节点集群报4013错误,业务不可用。尝试重启无法启动
【复现路径】
【附件及日志】

提供一下覆盖重启期间的observer日志

重启observer后没有产生日志
进程也不存在

通过ocp启动的,还是通过命令行方式启动的?
没产生日志有点奇怪,/home/admin/oceanbase/log/observer.log 没有新输出?

方便的话提个官方悬赏贴吧,方便快速解决

目前通过 。/bin/observer 命令启动,observer进程一直在。但是没有起来,产生如下日志

[2025-11-07 15:26:23.486152] WDIAG [SERVER] query (ob_inner_sql_connection.cpp:866) [953999][T1002_TenantInf][T1001][YB420A5A329B-000642FBF4ECE5E3-0-0] [lt=5][errcode=-5627] failed to process record(executor={ObIExecutor:, sql:"SELECT s., e.value as epoch FROM __all_service AS s RIGHT JOIN __all_service_epoch AS e ON s.tenant_id = e.tenant_id WHERE e.tenant_id = 1002 and e.name=‘service_name_epoch’ ORDER BY s.gmt_create"}, record_ret=-5627, ret=-5627)
[2025-11-07 15:26:23.486158] WDIAG [SERVER] query (ob_inner_sql_connection.cpp:892) [953999][T1002_TenantInf][T1002][YB420A5A329B-000642FBF4ECE5E3-0-0] [lt=5][errcode=-5627] failed to process final(executor={ObIExecutor:, sql:"SELECT s.
, e.value as epoch FROM _all_service AS s RIGHT JOIN all_service_epoch AS e ON s.tenant_id = e.tenant_id WHERE e.tenant_id = 1002 and e.name=‘service_name_epoch’ ORDER BY s.gmt_create"}, aret=-5627, ret=-5627)
[2025-11-07 15:26:23.486164] WDIAG [SERVER] execute_read_inner (ob_inner_sql_connection.cpp:1647) [953999][T1002_TenantInf][T1002][Y0-0000000000000000-0-0] [lt=5][errcode=-5627] execute sql failed(ret=-5627, tenant_id=1001, sql=SELECT s.*, e.value as epoch FROM all_service AS s RIGHT JOIN all_service_epoch AS e ON s.tenant_id = e.tenant_id WHERE e.tenant_id = 1002 and e.name=‘service_name_epoch’ ORDER BY s.gmt_create)
[2025-11-07 15:26:23.486168] WDIAG [SERVER] retry_while_no_tenant_resource (ob_inner_sql_connection.cpp:949) [953999][T1002_TenantInf][T1002][Y0-0000000000000000-0-0] [lt=4][errcode=-5627] retry_while_no_tenant_resource failed(ret=-5627, tenant_id=1001)
[2025-11-07 15:26:23.486173] WDIAG [SERVER] execute_read (ob_inner_sql_connection.cpp:1587) [953999][T1002_TenantInf][T1002][Y0-0000000000000000-0-0] [lt=5][errcode=-5627] execute_read failed(ret=-5627, cluster_id=1730112604, tenant_id=1001)
[2025-11-07 15:26:23.488336] INFO [SERVER] sleep_before_local_retry (ob_query_retry_ctrl.cpp:91) [953641][T1_FreInfoReloa][T1][YB420A5A329B-000642FBF90CE1C7-0-0] [lt=1] will sleep(sleep_us=48000, remain_us=724143, base_sleep_us=1000, retry_sleep_type=1, v.stmt_retry_times
=48, v.err
=-4722, timeout_timestamp=1762500384212478)
[2025-11-07 15:26:23.540329] INFO [SERVER] sleep_before_local_retry (ob_query_retry_ctrl.cpp:91) [953641][T1_FreInfoReloa][T1][YB420A5A329B-000642FBF90CE1C7-0-0] [lt=0] will sleep(sleep_us=49000, remain_us=672152, base_sleep_us=1000, retry_sleep_type=1, v.stmt_retry_times
=49, v.err
=-4722, timeout_timestamp=1762500384212478)
[2025-11-07 15:26:23.542642] WDIAG [SERVER] get_mem_limit (ob_mysql_request_manager.cpp:281) [954115][T1002_ReqMemEvi][T1002][Y0-0000000000000000-0-0] [lt=5][errcode=-4029] failed to get global sys variable(ret=-4029, tenant_id=1002, OB_SV_SQL_AUDIT_PERCENTAGE=“ob_sql_audit_percentage”, obj_val={“NULL”:“NULL”})
[2025-11-07 15:26:23.542652] WDIAG [SERVER] check_config_mem_limit (ob_eliminate_task.cpp:74) [954115][T1002_ReqMemEvi][T1002][Y0-0000000000000000-0-0] [lt=10][errcode=-4029] failed to get mem limit(ret=-4029, tenant_id=1002, mem_limit=107374182, config_mem_limit
=16777216)
[2025-11-07 15:26:23.542660] INFO [SERVER] runTimerTask (ob_eliminate_task.cpp:222) [954115][T1002_ReqMemEvi][T1002][Y0-0000000000000000-0-0] [lt=6] sql audit evict task end(request_manager
->get_tenant_id()=1002, evict_high_mem_level=8388608, evict_high_size_level=90000, evict_batch_count=0, elapse_time=0, size_used=348, mem_used=2079744)
[2025-11-07 15:26:23.562353] INFO [SERVER] runTimerTask (ob_eliminate_task.cpp:222) [953682][T1_ReqMemEvict][T1][Y0-0000000000000000-0-0] [lt=24] sql audit evict task end(request_manager
->get_tenant_id()=1, evict_high_mem_level=32212254, evict_high_size_level=90000, evict_batch_count=0, elapse_time=0, size_used=869, mem_used=2079744)
[2025-11-07 15:26:23.576461] WDIAG [SERVER] extract_tenant_id (obmp_connect.cpp:123) [953426][sql_nio1][T0][Y0-0000000000000000-0-0] [lt=7][errcode=-5160] get_tenant_id failed(ret=-5160, tenant_name=shsnc)
[2025-11-07 15:26:23.576469] WDIAG [SERVER] deliver_mysql_request (ob_srv_deliver.cpp:690) [953426][sql_nio1][T0][Y0-0000000000000000-0-0] [lt=6][errcode=-5160] extract tenant_id fail(ret=-5160, tenant_name=shsnc, tenant_id=18446744073709551615)
[2025-11-07 15:26:23.576550] WDIAG [SERVER] process (obmp_connect.cpp:328) [953299][MysqlQueueTh0][T0][Y0-000642FBF1ED0359-0-0] [lt=4][errcode=-8001] server is initializing(ret=-8001)
[2025-11-07 15:26:23.576572] WDIAG [SERVER] extract_tenant_id (obmp_connect.cpp:123) [953428][sql_nio3][T0][Y0-0000000000000000-0-0] [lt=7][errcode=-5160] get_tenant_id failed(ret=-5160, tenant_name=shsnc)
[2025-11-07 15:26:23.576596] WDIAG [SERVER] deliver_mysql_request (ob_srv_deliver.cpp:690) [953428][sql_nio3][T0][Y0-0000000000000000-0-0] [lt=6][errcode=-5160] extract tenant_id fail(ret=-5160, tenant_name=shsnc, tenant_id=18446744073709551615)
[2025-11-07 15:26:23.576638] INFO [SERVER] send_error_packet (obmp_packet_sender.cpp:319) [953299][MysqlQueueTh0][T0][Y0-000642FBF1ED0359-0-0] [lt=19] sending error packet(ob_error=-8001, client error=-8001, extra_err_info=NULL, lbt()=“0x10b6712c 0x8798838 0x8749ed8 0x87794c4 0xa96c878 0x1174dbc0 0x1174ea40 0x1174ef94 0x10e115a4 0x10e0d9e0 0xfffeb269878c 0xfffeb25e508c”)
[2025-11-07 15:26:23.576635] WDIAG [SERVER] process (obmp_connect.cpp:328) [953300][MysqlQueueTh1][T0][Y0-000642FBF1CD039F-0-0] [lt=4][errcode=-8001] server is initializing(ret=-8001)

日志太少,看不出初始化失败的原因,可以./bin/observer 启动后 提供完整的observer.log 日志。

进程不会是在root用户下启动的吧?
建议:

  1. su - admin # 必须切至 admin 用户下启动
  2. cd /home/admin/oceanbase/ # 必须进入该目录下进行重启,否则会有各种异常
  3. /home/admin/oceanbase/bin/observer
    4.将完整的 observer.log 上传到附件下。

ps -ef|grep observer看一下进程

1 个赞