【 使用环境 】预生产环境
【 使用版本 】oceanbase sql 4.4.1
【问题描述】3台虚拟机启动OceanBase,断电后,重新启动 observer均启动失败
【日志】如下
[2026-01-09 14:49:14.927905] INFO [SERVER.OMT] create_tenant (ob_multi_tenant.cpp:1294) [22530][observer][T0][Y0-0000000000000000-0-0] [lt=38] finish create new tenant(ret=-4070, tenant_id=1001, write_slog=false, create_step=5, bucket_lock_idx=3470)
[2026-01-09 14:49:14.927954] EDIAG [STORAGE] handle_tenant_create_commit (ob_server_checkpoint_slog_handler.cpp:848) [22530][observer][T0][Y0-0000000000000000-0-0] [lt=14][errcode=-4070] fail to replay create tenant(ret=-4070, tenant_meta={unit:{tenant_id:1001, unit_id:1002, has_memstore:true, unit_status:“NORMAL”, config:{unit_config_id:1001, name:“u1”, resource:{min_cpu:1, max_cpu:1, memory_size:“1GB”, log_disk_size:“1GB”, data_disk_size:0, min_iops:10000, max_iops:10000, iops_weight:2, max_net_bandwidth:INT64_MAX, net_bandwidth_weight:8, }}, mode:0, create_timestamp:1762431192659427, is_removed:false, hidden_sys_data_disk_config_size:0, actual_data_disk_size:0, replica_type:0}, super_block:{tenant_id:1001, replay_start_point:ObLogCursor{file_id=6, log_id=505720, offset=6049305}, ls_meta_entry:{[ver=1,mode=0,seq=789251][2nd=21128]}, wait_gc_tablet_entry:{[ver=1,mode=0,seq=0][2nd=18446744073709551615]}, tablet_meta_entry:{[ver=1,mode=0,seq=0][2nd=18446744073709551615]}, is_hidden:false, version:4, snapshot_cnt:0, preallocated_seqs:{object_seq:60000, tmp_file_seq:60000, write_seq:60000}, auto_inc_ls_epoch:0, ls_cnt:0, min_file_id:0, max_file_id:0}, create_status:1, epoch:0}) BACKTRACE:0x91dffd8 0x90e38cd 0x9194ab0 0x9194587 0x91944dd 0x9194308 0x1ac6c087 0x1ac5ec4e 0x1a7522e7 0x12027f6c 0xe99e2bf 0xe9a3d02 0x23e1aca0 0xe9a0527 0x7fca526d4b17 0x96682aa
[2026-01-09 14:49:14.928109] EDIAG [STORAGE] apply_replay_result (ob_server_checkpoint_slog_handler.cpp:788) [22530][observer][T0][Y0-0000000000000000-0-0] [lt=154][errcode=-4070] fail to handle tenant create commit(ret=-4070, tenant_meta={unit:{tenant_id:1001, unit_id:1002, has_memstore:true, unit_status:“NORMAL”, config:{unit_config_id:1001, name:“u1”, resource:{min_cpu:1, max_cpu:1, memory_size:“1GB”, log_disk_size:“1GB”, data_disk_size:0, min_iops:10000, max_iops:10000, iops_weight:2, max_net_bandwidth:INT64_MAX, net_bandwidth_weight:8, }}, mode:0, create_timestamp:1762431192659427, is_removed:false, hidden_sys_data_disk_config_size:0, actual_data_disk_size:0, replica_type:0}, super_block:{tenant_id:1001, replay_start_point:ObLogCursor{file_id=6, log_id=505720, offset=6049305}, ls_meta_entry:{[ver=1,mode=0,seq=789251][2nd=21128]}, wait_gc_tablet_entry:{[ver=1,mode=0,seq=0][2nd=18446744073709551615]}, tablet_meta_entry:{[ver=1,mode=0,seq=0][2nd=18446744073709551615]}, is_hidden:false, version:4, snapshot_cnt:0, preallocated_seqs:{object_seq:60000, tmp_file_seq:60000, write_seq:60000}, auto_inc_ls_epoch:0, ls_cnt:0, min_file_id:0, max_file_id:0}, create_status:1, epoch:0}) BACKTRACE:0x91dffd8 0x90e38cd 0x9194ab0 0x9194587 0x91944dd 0x9194308 0x1ac6b90d 0x1ac5ec79 0x1a7522e7 0x12027f6c 0xe99e2bf 0xe9a3d02 0x23e1aca0 0xe9a0527 0x7fca526d4b17 0x96682aa
[2026-01-09 14:49:14.928174] INFO [STORAGE] apply_replay_result (ob_server_checkpoint_slog_handler.cpp:813) [22530][observer][T0][Y0-0000000000000000-0-0] [lt=64] finish replay create tenants(ret=-4070, tenant_count=3)
[2026-01-09 14:49:14.928181] WDIAG [STORAGE] start (ob_server_checkpoint_slog_handler.cpp:108) [22530][observer][T0][Y0-0000000000000000-0-0] [lt=7][errcode=-4070] fail to apply replay result(ret=-4070)
[2026-01-09 14:49:14.928200] WDIAG [STORAGE] start (ob_server_storage_meta_service.cpp:73) [22530][observer][T0][Y0-0000000000000000-0-0] [lt=6][errcode=-4070] fail to start replay(ret=-4070)
[2026-01-09 14:49:14.928213] INFO [STORAGE] start (ob_server_storage_meta_service.cpp:84) [22530][observer][T0][Y0-0000000000000000-0-0] [lt=13] finish start server storage meta service(ret=-4070, cost_time_us=16177088)
[2026-01-09 14:49:14.928221] EDIAG [SERVER] start (ob_server.cpp:1027) [22530][observer][T0][Y0-0000000000000000-0-0] [lt=7][errcode=-4070] fail to start server storage meta service(ret=-4070, ret=“OB_INVALID_DATA”) BACKTRACE:0x91dffd8 0x90e38cd 0x9194ab0 0x9194587 0x91944dd 0x9194308 0x1202ddb4 0x120294af 0xe99e2bf 0xe9a3d02 0x23e1aca0 0xe9a0527 0x7fca526d4b17 0x96682aa
[2026-01-09 14:49:14.928375] ERROR [SERVER] start (ob_server.cpp:1185) [22530][observer][T0][Y0-0000000000000000-0-0] [lt=42][errcode=-4070] [server_start 9/18] observer instance start fail. you may find solutions in previous error logs or seek help from official technicians.
[2026-01-09 14:49:14.963792] WDIAG [SHARE.LOCATION] nonblock_get_leader (ob_ls_location_service.cpp:439) [22625][EvtHisUpdTask][T0][Y0-0000000000000000-0-0] [lt=93][errcode=-4721] nonblock get location failed(ret=-4721, ret=“OB_LS_LOCATION_NOT_EXIST”, cluster_id=1, tenant_id=1, ls_id={id:1})
[2026-01-09 14:49:14.963899] WDIAG [SHARE.LOCATION] get_leader_with_retry_until_timeout (ob_location_service.cpp:117) [22625][EvtHisUpdTask][T0][Y0-0000000000000000-0-0] [lt=64][errcode=-4721] fail to get log stream location leader with retry until_timeout(ret=-4721, ret=“OB_LS_LOCATION_NOT_EXIST”, cluster_id=1, tenant_id=1, ls_id={id:1}, leader=“0.0.0.0:0”, abs_retry_timeout=1767941355161463, retry_interval=200000)
[2026-01-09 14:49:14.963925] WDIAG [SERVER] nonblock_get_leader (ob_inner_sql_connection.cpp:2032) [22625][EvtHisUpdTask][T0][Y0-0000000000000000-0-0] [lt=25][errcode=-4721] get leader with retry until timeout failed(ret=-4721, ret=“OB_LS_LOCATION_NOT_EXIST”, tenant_id=1, ls_id={id:1}, leader=“0.0.0.0:0”, cluster_id=1, tmp_abs_timeout_us=1767941355161463, retry_interval_us=200000)
[2026-01-09 14:49:14.963940] WDIAG [SHARE.SCHEMA] check_if_tenant_has_been_dropped (ob_multi_version_schema_service.cpp:1980) [22625][EvtHisUpdTask][T0][Y0-0000000000000000-0-0] [lt=12][errcode=-4006] local schema not inited,(ret=-4006, tenant_id=1)
[2026-01-09 14:49:14.963960] WDIAG [SERVER] nonblock_get_leader (ob_inner_sql_connection.cpp:2023) [22625][EvtHisUpdTask][T0][Y0-0000000000000000-0-0] [lt=18][errcode=0] user tenant has been dropped(ret=0, ret=“OB_SUCCESS”, tenant_id=1)
[2026-01-09 14:49:14.963973] WDIAG [SHARE.LOCATION] nonblock_get_leader (ob_ls_location_service.cpp:439) [22625][EvtHisUpdTask][T0][Y0-0000000000000000-0-0] [lt=10][errcode=-4721] nonblock get location failed(ret=-4721, ret=“OB_LS_LOCATION_NOT_EXIST”, cluster_id=1, tenant_id=1, ls_id={id:1})
[2026-01-09 14:49:14.965307] WDIAG [SHARE.LOCATION] nonblock_get_leader (ob_ls_location_service.cpp:439) [22996][T1_LogLoop][T1][Y0-0000000000000000-0-0] [lt=17][errcode=-4721] nonblock get location failed(ret=-4721, ret=“OB_LS_LOCATION_NOT_EXIST”, cluster_id=1, tenant_id=1, ls_id={id:1})
[2026-01-09 14:49:14.965438] WDIAG [SHARE.LOCATION] nonblock_get_leader (ob_location_service.cpp:159) [22996][T1_LogLoop][T1][Y0-0000000000000000-0-0] [lt=128][errcode=-4721] fail to nonblock get log stream location leader(ret=-4721, ret=“OB_LS_LOCATION_NOT_EXIST”, cluster_id=1, tenant_id=1, ls_id={id:1}, leader=“0.0.0.0:0”)
[2026-01-09 14:49:14.965479] WDIAG [PALF] check_and_try_fetch_log_ (log_state_mgr.cpp:1095) [22996][T1_LogLoop][T1][Y0-0000000000000000-0-0] [lt=39][errcode=-4721] sw try_fetch_log failed(ret=-4721, palf_id=1)
[2026-01-09 14:49:14.965497] WDIAG [PALF] get_election_leader_without_lock_ (palf_handle_impl.cpp:4724) [22996][T1_LogLoop][T1][Y0-0000000000000000-0-0] [lt=12][errcode=-4209] election has no leader(ret=-4209, this={palf_id:1, self:“192.168.15.179:2882”, has_set_deleted:false})
[2026-01-09 14:49:14.969268] INFO [COMMON] compute_tenant_wash_size (ob_kvcache_store.cpp:1467) [22538][TimerWK2_KVCacheWash][T0][Y0-0000000000000000-0-0] [lt=28] Wash compute wash size(is_wash_valid=false, sys_total_wash_size=-13813524890, global_cache_size=6242304, tenant_max_wash_size=0, tenant_min_wash_size=0, tenant_ids_=[500, 508, 509, 1])
[2026-01-09 14:49:14.969330] INFO [COMMON] wash (ob_kvcache_hazard_domain.h:326) [22538][TimerWK2_KVCacheWash][T0][Y0-0000000000000000-0-0] [lt=57] allocator wash time (wash_time_us=1, freed_blocks_num=0, memory_efficiency=1.000000000000000000e+00)
[2026-01-09 14:49:14.975972] WDIAG [SHARE] check_write_limited (ob_local_device.cpp:1315) [22841][T1_Occam][T1][Y0-0000000000000000-0-0] [lt=23][errcode=-4006] The ObLocalDevice has not been marked(ret=-4006)
[2026-01-09 14:49:14.976060] WDIAG [COORDINATOR] detect_data_disk_full_ (ob_failure_detector.cpp:489) [22841][T1_Occam][T1][Y0-0000000000000000-0-0] [lt=84][errcode=-4006] check space full failed(ret=-4006)
[2026-01-09 14:49:14.979704] INFO destroy_tg (thread_mgr.cpp:89) [22530][observer][T0][Y0-0000000000000000-0-0] [lt=11] destroy tg(tg_id=284, tg=0x7fca3c947cb0, tg->attr_={name:StartupAccelHandler, type:4})
[2026-01-09 14:49:14.979789] EDIAG [SERVER] start (ob_server.cpp:1263) [22530][observer][T0][Y0-0000000000000000-0-0] [lt=74][errcode=-4070] failure occurs, try to set stop and wait(ret=-4070, ret=“OB_INVALID_DATA”) BACKTRACE:0x91dffd8 0x90e38cd 0x9194ab0 0x9194587 0x91944dd 0x9194308 0x1202fc0a 0x1202aabb 0xe99e2bf 0xe9a3d02 0x23e1aca0 0xe9a0527 0x7fca526d4b17 0x96682aa
[2026-01-09 14:49:14.979954] ERROR [SERVER] start (ob_server.cpp:1267) [22530][observer][T0][Y0-0000000000000000-0-0] [lt=55][errcode=-4070] [server_start 10/18] observer start fail, the stop status is true. you may find solutions in previous error logs or seek help from official technicians.
666
日志里有无法启动存储服务和无主的问题,先检查下三台机器的磁盘挂载、网络通信、时钟同步是不是正常的。
66666
再看看日志呢
麻烦发一下 重启以后observer.log日志 三个节点都发一下
目前新用户上传不了日志,我这边用obdiag诊断了一下日志,这个如果不够详细,我后续再上传一下日志
[root@tidb1 observer]# /root/oceanbase-diagnostic-tool/obdiag analyze log --files /ob/observer/log/observer.log
obdiag version: 4.0.0
analyze_log_offline start …
analyze nodes’s log start. Please wait a moment…
analyze start ok
FileListInfo:
±----------±----------------------------------+
| Node | LogList |
+===========+===================================+
| 127.0.0.1 | [’/ob/observer/log/observer.log’] |
±----------±----------------------------------+
Analyze OceanBase Offline Log Summary:
±----------±----------±------------------------------------------------------------------------------------±---------------------------±------------±------------------------------±--------+
| Node | Status | FileName | First Found Time | ErrorCode | Message | Count |
+===========+===========+=====================================================================================+============================+=============+===============================+=========+
| 127.0.0.1 | Completed | /ob/observer/obdiag_analyze_pack_20260109173212/local/_ob_observer_log_observer.log | 2026-01-06 16:06:16.795266 | -4070 | Invalid data | 96 |
±----------±----------±------------------------------------------------------------------------------------±---------------------------±------------±------------------------------±--------+
| 127.0.0.1 | Completed | /ob/observer/obdiag_analyze_pack_20260109173212/local/_ob_observer_log_observer.log | 2026-01-06 16:06:16.797321 | -4006 | The object is not initialized | 16 |
±----------±----------±------------------------------------------------------------------------------------±---------------------------±------------±------------------------------±--------+
| 127.0.0.1 | Completed | /ob/observer/obdiag_analyze_pack_20260109173212/local/_ob_observer_log_observer.log | 2026-01-06 16:06:26.660872 | -4016 | Internal error | 16 |
±----------±----------±------------------------------------------------------------------------------------±---------------------------±------------±------------------------------±--------+
For more details, please run cmd ’ cat /ob/observer/obdiag_analyze_pack_20260109173212/result_details.txt ’
Trace ID: 13c9fdac-ed3e-11f0-9684-000c29b7d053
尽量还是发一下ob启动时候的日志信息
坐等大佬发力