【 使用环境 】生产环境 or 测试环境
【 OB or 其他组件 】
【 使用版本 】
【问题描述】我的OceanBase集群 服务器突然断电,重启集群的时候一直在 wait for observer init状态等待,已经等待两天,我查看observer.log没有error级别日志,查询election.log提示刷新优先级失败
【复现路径】问题出现前后相关操作
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(
observer.log日志:[2025-02-07 14:28:44.771117] WDIAG [SHARE.LOCATION] batch_process_tasks (ob_ls_location_service.cpp:549) [113024][SysLocAsyncUp0][T0][YB42C0A800CB-00062D8769F37A4C-0-0] [lt=14][errcode=0] tenant schema is not ready, need wait(ret=0, ret=“OB_SUCCESS”, superior_tenant_id=1, task={cluster_id:1704768644, tenant_id:1, ls_id:{id:1}, renew_for_tenant:false, add_timestamp:1738909724771095})
[2025-02-07 14:28:44.771929] WDIAG [SHARE.LOCATION] nonblock_get_leader (ob_ls_location_service.cpp:450) [113072][TsMgr][T0][Y0-0000000000000000-0-0] [lt=8][errcode=-4722] REACH SYSLOG RATE LIMIT [bandwidth]
[2025-02-07 14:28:44.771960] INFO [STORAGE.TRANS] refresh_gts_location_ (ob_gts_source.cpp:580) [113072][TsMgr][T0][Y0-0000000000000000-0-0] [lt=0] gts nonblock renew success(ret=0, tenant_id=1, gts_local_cache={srr:[mts=0], gts:0, latest_srr:[mts=0]})
[2025-02-07 14:28:44.771972] INFO [STORAGE.TRANS] refresh_gts (ob_gts_source.cpp:516) [113072][TsMgr][T0][Y0-0000000000000000-0-0] [lt=11] refresh gts(ret=-4722, ret=“OB_LS_LOCATION_LEADER_NOT_EXIST”, tenant_id=1, need_refresh=false, gts_local_cache={srr:[mts=0], gts:0, latest_srr:[mts=0]})
[2025-02-07 14:28:44.771979] INFO [STORAGE.TRANS] operator() (ob_ts_mgr.h:171) [113072][TsMgr][T0][Y0-0000000000000000-0-0] [lt=0] refresh gts functor(ret=-4722, ret=“OB_LS_LOCATION_LEADER_NOT_EXIST”, gts_tenant_info={v:1})
[2025-02-07 14:28:44.771994] WDIAG [SHARE.LOCATION] check_ls_exist (ob_location_service.cpp:518) [113072][TsMgr][T0][Y0-0000000000000000-0-0] [lt=0][errcode=-5157] tenant does not exist(ret=-5157, ret=“OB_TENANT_NOT_EXIST”, tenant_id=1003)
[2025-02-07 14:28:44.772000] WDIAG [SHARE.SCHEMA] check_if_tenant_has_been_dropped (ob_multi_version_schema_service.cpp:2073) [113072][TsMgr][T0][Y0-0000000000000000-0-0] [lt=0][errcode=-4006] local schema not inited,(ret=-4006, tenant_id=1003)
[2025-02-07 14:28:44.772005] INFO [STORAGE.TRANS] statistics (ob_gts_source.cpp:70) [113072][TsMgr][T0][Y0-0000000000000000-0-0] [lt=0] gts statistics(tenant_id=1003, gts_rpc_cnt=0, get_gts_cache_cnt=9103, get_gts_with_stc_cnt=77, try_get_gts_cache_cnt=0, try_get_gts_with_stc_cnt=0, wait_gts_elapse_cnt=0, try_wait_gts_elapse_cnt=0)
[2025-02-07 14:28:44.772012] INFO [STORAGE.TRANS] refresh_gts (ob_gts_source.cpp:516) [113072][TsMgr][T0][Y0-0000000000000000-0-0] [lt=7] refresh gts(ret=-4721, ret=“OB_LS_LOCATION_NOT_EXIST”, tenant_id=1003, need_refresh=false, gts_local_cache={srr:[mts=0], gts:0, latest_srr:[mts=0]})
[2025-02-07 14:28:44.772202] WDIAG [SHARE.LOCATION] check_ls_exist (ob_location_service.cpp:518) [113072][TsMgr][T0][Y0-0000000000000000-0-0] [lt=1][errcode=-5157] tenant does not exist(ret=-5157, ret=“OB_TENANT_NOT_EXIST”, tenant_id=1004)
[2025-02-07 14:28:44.772211] WDIAG [SHARE.SCHEMA] check_if_tenant_has_been_dropped (ob_multi_version_schema_service.cpp:2073) [113072][TsMgr][T0][Y0-0000000000000000-0-0] [lt=0][errcode=-4006] local schema not inited,(ret=-4006, tenant_id=1004)
[2025-02-07 14:28:44.772222] INFO [STORAGE.TRANS] refresh_gts_location_ (ob_gts_source.cpp:580) [113072][TsMgr][T0][Y0-0000000000000000-0-0] [lt=0] gts nonblock renew success(ret=0, tenant_id=1004, gts_local_cache={srr:[mts=0], gts:0, latest_srr:[mts=0]})
[2025-02-07 14:28:44.772227] INFO [STORAGE.TRANS] refresh_gts (ob_gts_source.cpp:516) [113072][TsMgr][T0][Y0-0000000000000000-0-0] [lt=4] refresh gts(ret=-4721, ret=“OB_LS_LOCATION_NOT_EXIST”, tenant_id=1004, need_refresh=false, gts_local_cache={srr:[mts=0], gts:0, latest_srr:[mts=0]})
[2025-02-07 14:28:44.779889] WDIAG [SHARE.LOCATION] check_ls_exist (ob_location_service.cpp:518) [113603][T1003_Occam][T1003][Y0-0000000000000000-0-0] [lt=1][errcode=-5157] tenant does not exist(ret=-5157, ret=“OB_TENANT_NOT_EXIST”, tenant_id=1003)
[2025-02-07 14:28:44.779900] WDIAG [SHARE.SCHEMA] check_if_tenant_has_been_dropped (ob_multi_version_schema_service.cpp:2073) [113603][T1003_Occam][T1003][Y0-0000000000000000-0-0] [lt=0][errcode=-4006] local schema not inited,(ret=-4006, tenant_id=1003)
[2025-02-07 14:28:44.781581] WDIAG [SHARE] refresh (ob_task_define.cpp:402) [112923][LogLimiterRefre][T0][Y0-0000000000000000-0-0] [lt=23][errcode=0] Throttled WDIAG logs in last second(details {error code, dropped logs, earliest tid}=[{errcode:-4023, dropped:119, tid:113766}, {errcode:-4721, dropped:258, tid:113644}])
election.log日志:
[2025-02-07 14:29:52.395297] WDIAG [ELECT] refresh_priority_ (election_impl.cpp:332) [113835][T1004_L0_G2][T1004][YB42C0A800CB-00062D876C5378DC-0-0] [lt=16][errcode=-4018] refresh priority failed(ret=-4018, ret=“OB_ENTRY_NOT_EXIST”, *this={is_inited:true, is_running:true, proposer:{ls_id:{id:1001}, addr:“192.168.0.203:2882”, role:Leader, ballot_number:3, prepare_success_ballot:3, lease_interval:4.00s, memberlist_with_states:{member_list:{addr_list:[“192.168.0.201:2882”, “192.168.0.202:2882”, “192.168.0.203:2882”], membership_version:{proposal_id:10154, config_seq:5225}, replica_num:3}, prepare_ok:[false, true, true], accept_ok_promised_ts:[invalid, 14:29:55.892, 14:29:55.892]follower_promise_membership_version:[{proposal_id:9223372036854775807, config_seq:-1}, {proposal_id:10154, config_seq:5225}, {proposal_id:10154, config_seq:5225}]}, lease_and_epoch:{leader_lease:{span_from_now:3.497s, expired_time_point:14:29:55.892}, epoch:3}, priority_seed:0x1000, restart_counter:1, last_do_prepare_ts:2025-02-07 14:25:12.658420, self_priority:{priority:{is_valid:false, is_observer_stopped:false, is_server_stopped:false, is_zone_stopped:false, fatal_failures:[], is_primary_region:false, serious_failures:[{type:SCHEMA NOT REFRESHED, module:SCHEMA, info:schema not refreshed, level:SERIOUS}], is_in_blacklist:false, in_blacklist_reason:, scn:{val:0, v:0}, is_manual_leader:false, zone_priority:9223372036854775807}}, p_election:0x7f852dbf7030}, acceptor:{ls_id:{id:1001}, addr:“192.168.0.203:2882”, ballot_number:3, ballot_of_time_window:3, lease:{owner:“192.168.0.203:2882”, lease_end_ts:{span_from_now:4.000s, expired_time_point:14:29:56.395}, ballot_number:3}, is_time_window_opened:False, vote_reason:the only request, last_time_window_open_ts:2025-02-07 14:25:12.658605, highest_priority_prepare_req:{this:0x7f852dbf78b0, BASE:{msg_type:“Prepare Request”, id:1001, sender:“192.168.0.203:2882”, receiver:“192.168.0.203:2882”, restart_counter:1, ballot_number:3, debug_ts:{src_construct_ts:“25:12.658350”, src_serialize_ts:“25:12.658406”, dest_deserialize_ts:“25:12.658489”, dest_process_ts:“25:12.658491”, process_delay:141}, biggest_min_cluster_version_ever_seen:4.2.1.1}, role:“Follower”, is_buffer_valid:true, inner_priority_seed:4096, membership_version:{proposal_id:10154, config_seq:5225}}, p_election:0x7f852dbf7030}, ls_biggest_min_cluster_version_ever_seen:4.2.1.1, priority:{priority:{is_valid:false, is_observer_stopped:false, is_server_stopped:false, is_zone_stopped:false, fatal_failures:[], is_primary_region:false, serious_failures:[{type:SCHEMA NOT REFRESHED, module:SCHEMA, info:schema not refreshed, level:SERIOUS}], is_in_blacklist:false, in_blacklist_reason:, scn:{val:0, v:0}, is_manual_leader:false, zone_priority:9223372036854775807}}})
[2025-02-07 14:29:52.395702] WDIAG [ELECT] refresh_priority_ (election_impl.cpp:332) [113835][T1004_L0_G2][T1004][YB42C0A800CB-00062D876C5378DB-0-0] [lt=11][errcode=-4018] refresh priority failed(ret=-4018, ret=“OB_ENTRY_NOT_EXIST”, *this={is_inited:true, is_running:true, proposer:{ls_id:{id:1001}, addr:“192.168.0.203:2882”, role:Leader, ballot_number:3, prepare_success_ballot:3, lease_interval:4.00s, memberlist_with_states:{member_list:{addr_list:[“192.168.0.201:2882”, “192.168.0.202:2882”, “192.168.0.203:2882”], membership_version:{proposal_id:10154, config_seq:5225}, replica_num:3}, prepare_ok:[false, true, true], accept_ok_promised_ts:[invalid, 14:29:56.395, 14:29:56.395]follower_promise_membership_version:[{proposal_id:9223372036854775807, config_seq:-1}, {proposal_id:10154, config_seq:5225}, {proposal_id:10154, config_seq:5225}]}, lease_and_epoch:{leader_lease:{span_from_now:3.999s, expired_time_point:14:29:56.395}, epoch:3}, priority_seed:0x1000, restart_counter:1, last_do_prepare_ts:2025-02-07 14:25:12.658420, self_priority:{priority:{is_valid:false, is_observer_stopped:false, is_server_stopped:false, is_zone_stopped:false, fatal_failures:[], is_primary_region:false, serious_failures:[{type:SCHEMA NOT REFRESHED, module:SCHEMA, info:schema not refreshed, level:SERIOUS}], is_in_blacklist:false, in_blacklist_reason:, scn:{val:0, v:0}, is_manual_leader:false, zone_priority:9223372036854775807}}, p_election:0x7f852dbf7030}, acceptor:{ls_id:{id:1001}, addr:“192.168.0.203:2882”, ballot_number:3, ballot_of_time_window:3, lease:{owner:“192.168.0.203:2882”, lease_end_ts:{span_from_now:3.999s, expired_time_point:14:29:56.395}, ballot_number:3}, is_time_window_opened:False, vote_reason:the only request, last_time_window_open_ts:2025-02-07 14:25:12.658605, highest_priority_prepare_req:{this:0x7f852dbf78b0, BASE:{msg_type:“Prepare Request”, id:1001, sender:“192.168.0.203:2882”, receiver:“192.168.0.203:2882”, restart_counter:1, ballot_number:3, debug_ts:{src_construct_ts:“25:12.658350”, src_serialize_ts:“25:12.658406”, dest_deserialize_ts:“25:12.658489”, dest_process_ts:“25:12.658491”, process_delay:141}, biggest_min_cluster_version_ever_seen:4.2.1.1}, role:“Follower”, is_buffer_valid:true, inner_priority_seed:4096, membership_version:{proposal_id:10154, config_seq:5225}}, p_election:0x7f852dbf7030}, ls_biggest_min_cluster_version_ever_seen:4.2.1.1, priority:{priority:{is_valid:false, is_observer_stopped:false, is_server_stopped:false, is_zone_stopped:false, fatal_failures:[], is_primary_region:false, serious_failures:[{type:SCHEMA NOT REFRESHED, module:SCHEMA, info:schema not refreshed, level:SERIOUS}], is_in_blacklist:false, in_blacklist_reason:, scn:{val:0, v:0}, is_manual_leader:false, zone_priority:9223372036854775807}}})
我这种情况能把数据恢复出来,重新安装集群吗?
这种情况下,如果数据库无法启动了,我是否还有别的办法恢复数据,或者是导出数据?我使用obclient连接任何节点的observer都提示错误:ERROR 8001 (08004): Server is initializing
麻烦发一份完整的observer日志附件看一下
使用obclient连接是否sys和业务租户都无法登录?
是的 sys 还有业务租户都无法登录
是使用obd重启的吧?
麻烦把所有节点ob进程杀掉重新启动一遍,然后再发一份涵盖启动的observer日志
现在这个状态还能修复集群或者导出数据吗
我现在在单节点上使用obclient -h127.0.0.1 -P2881 -uroot@sys -c -A 可以登录进数据库提示是这个:[root@localhost ~]# obclient -h127.0.0.1 -P2881 -uroot@sys
Welcome to the OceanBase. Commands end with ; or \g.
Your OceanBase connection id is 3222011910
Server version: OceanBase_CE 4.2.1.1 (r101010012023111012-2f6924cd5a576f09d6e7f212fac83f1a15ff531a) (Built Nov 10 2023 12:13:59)
Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved.
Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.
obclient(root@sys)[(none)]> show databases;
ERROR 1146 (42S02): Table ‘oceanbase.__all_database’ doesn’t exist
obclient(root@sys)[(none)]>
这种情况我应该怎样操作,可以修复集群或者导出数据
麻烦稍等一下
好的
集群初始化失败,找不到日志流。df -h看一下磁盘是否满了
是obd搭建的么,麻烦提供一下yaml文件参数配置
这种情况下数据是否能导出,然后我重新部署集群再导入数据。或者是能不能有办法修复集群启动?
经常突然断电可不是一个好的数据库基础环境
那这种情况下我的数据能恢复出来吗
或者是重启服务器再试试?
我看你说的是其中一个节点的日志信息 其它的节点重启后的日志 能发一下么?
请稍等一下 我重启节点 给你涵盖启动信息的日志