单节点 ob 启动失败:OB_INVALID_DATA

周末服务器异常停机了,开机后启动 ob 失败:
observer.log:

[2025-06-30 12:10:53.322502] EDIAG [SERVER] start (ob_server.cpp:1235) [58551][observer][T0][Y0-0000000000000001-0-0] [lt=242][errcode=-4070] failure occurs, try to set stop and wait(ret=-4070, ret="OB_INVALID_DATA") 
BACKTRACE:0x9525fb6 0x90a6216 0x90a5779 0x90a5246 0x90a4ee4 0x90a4cdb 0x1183a106 0x11832df3 0xdafae6c 0x248583b0 0xdaf60bd 0x7f2a3f8bcc87 0x9832434

OB版本: 4.3.5 bp1,单节点,数据盘 和 clog 分别放在两个磁盘。
请问该如何排查原因,数据能否恢复?

1 个赞

日志有点少

1 个赞
[2025-06-30 12:58:17.540870] INFO  [COMMON] retire (ob_kvcache_hazard_version.cpp:289) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=92] erase tenant hazard map node details(ret=0, tenant_id=1)
[2025-06-30 12:58:17.541143] INFO  [COMMON] erase_tenant (ob_kvcache_inst_map.cpp:415) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=532] erase tenant cache inst details(ret=0, tenant_id=1)
[2025-06-30 12:58:17.541180] INFO  [COMMON] sync_flush_tenant (ob_kv_storecache.cpp:556) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=117] erase tenant cache details(ret=0, tenant_id=1)
[2025-06-30 12:58:17.541255] INFO  [SERVER.OMT] create_tenant (ob_multi_tenant.cpp:1212) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=192] finish create new tenant(ret=-4070, tenant_id=1, write_slog=false, create_step=5, bucket_lock_idx=9780)
[2025-06-30 12:58:17.541291] EDIAG [STORAGE] handle_tenant_create_commit_ (ob_server_storage_meta_replayer.cpp:172) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=103][errcode=-4070] fail to replay create tenant(ret=-4070, tenant_meta={unit:{tenant_id:1, unit_id:1, has_memstore:true, unit_status:"NORMAL", config:{unit_config_id:1, name:"sys_unit_config", resource:{min_cpu:3, max_cpu:3, memory_size:"2GB", log_disk_size:"4GB", data_disk_size:0, min_iops:9223372036854775807, max_iops:9223372036854775807, iops_weight:3, max_net_bandwidth:INT64_MAX, net_bandwidth_weight:3, }}, mode:0, create_timestamp:1749099226593957, is_removed:false, hidden_sys_data_disk_config_size:0}, super_block:{tenant_id:1, replay_start_point:ObLogCursor{file_id=1, log_id=1, offset=0}, ls_meta_entry:{[ver=1,mode=0,seq=0][2nd=18446744073709551615]}, tablet_meta_entry:{[ver=1,mode=0,seq=0][2nd=18446744073709551615]}, is_hidden:false, version:4, snapshot_cnt:0, preallocated_seqs:{object_seq:60000, tmp_file_seq:60000, write_seq:60000}, auto_inc_ls_epoch:0, ls_cnt:0}, create_status:1, epoch:0}) BACKTRACE:0x9525fb6 0x90a6216 0x90a5779 0x90a5246 0x90a4ee4 0x90a4cdb 0x1a2177f2 0x1a210285 0x1a20f927 0x1182ff3e 0xdafae6c 0x248583b0 0xdaf60bd 0x14556d5aac87 0x9832434
[2025-06-30 12:58:17.541757] EDIAG [STORAGE] apply_replay_result_ (ob_server_storage_meta_replayer.cpp:112) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=1506][errcode=-4070] fail to handle tenant create commit(ret=-4070, tenant_meta={unit:{tenant_id:1, unit_id:1, has_memstore:true, unit_status:"NORMAL", config:{unit_config_id:1, name:"sys_unit_config", resource:{min_cpu:3, max_cpu:3, memory_size:"2GB", log_disk_size:"4GB", data_disk_size:0, min_iops:9223372036854775807, max_iops:9223372036854775807, iops_weight:3, max_net_bandwidth:INT64_MAX, net_bandwidth_weight:3, }}, mode:0, create_timestamp:1749099226593957, is_removed:false, hidden_sys_data_disk_config_size:0}, super_block:{tenant_id:1, replay_start_point:ObLogCursor{file_id=1, log_id=1, offset=0}, ls_meta_entry:{[ver=1,mode=0,seq=0][2nd=18446744073709551615]}, tablet_meta_entry:{[ver=1,mode=0,seq=0][2nd=18446744073709551615]}, is_hidden:false, version:4, snapshot_cnt:0, preallocated_seqs:{object_seq:60000, tmp_file_seq:60000, write_seq:60000}, auto_inc_ls_epoch:0, ls_cnt:0}, create_status:1, epoch:0}) BACKTRACE:0x9525fb6 0x90a6216 0x90a5779 0x90a5246 0x90a4ee4 0x90a4cdb 0x1a21706e 0x1a2102b4 0x1a20f927 0x1182ff3e 0xdafae6c 0x248583b0 0xdaf60bd 0x14556d5aac87 0x9832434
[2025-06-30 12:58:17.541890] INFO  [STORAGE] apply_replay_result_ (ob_server_storage_meta_replayer.cpp:137) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=367] finish replay create tenants(ret=-4070, tenant_count=7)
[2025-06-30 12:58:17.541910] WDIAG [STORAGE] start_replay (ob_server_storage_meta_replayer.cpp:60) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=58][errcode=-4070] fail to apply repaly result(ret=-4070)
[2025-06-30 12:58:17.541986] WDIAG [STORAGE] start (ob_server_storage_meta_service.cpp:77) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=49][errcode=-4070] fail to start replayer(ret=-4070)
[2025-06-30 12:58:17.542007] INFO  [STORAGE] start (ob_server_storage_meta_service.cpp:84) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=65] finish start server storage meta service(ret=-4070, cost_time_us=12291307)
[2025-06-30 12:58:17.542059] EDIAG [SERVER] start (ob_server.cpp:1010) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=80][errcode=-4070] fail to start server storage meta service(ret=-4070, ret="OB_INVALID_DATA") BACKTRACE:0x9525fb6 0x90a6216 0x90a5779 0x90a5246 0x90a4ee4 0x90a4cdb 0x11836d5c 0x11832f54 0xdafae6c 0x248583b0 0xdaf60bd 0x14556d5aac87 0x9832434
[2025-06-30 12:58:17.542184] ERROR [SERVER] start (ob_server.cpp:1165) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=129][errcode=-4070] [server_start 9/18] observer instance start fail. you may find solutions in previous error logs or seek help from official technicians.
[2025-06-30 12:58:17.547376] WDIAG [SERVER.OMT] check_cgroup_root_dir (ob_cgroup_ctrl.cpp:212) [88477][MultiTenant][T0][Y0-0000000000000000-0-0] [lt=229][errcode=-4027] dir not exist(OBSERVER_ROOT_CGROUP_DIR="cgroup", ret=-4027)
[2025-06-30 12:58:17.557563] WDIAG [SERVER.OMT] check_cgroup_root_dir (ob_cgroup_ctrl.cpp:212) [88477][MultiTenant][T0][Y0-0000000000000000-0-0] [lt=218][errcode=-4027] dir not exist(OBSERVER_ROOT_CGROUP_DIR="cgroup", ret=-4027)
[2025-06-30 12:58:17.563171] WDIAG [SHARE.LOCATION] nonblock_get_leader (ob_ls_location_service.cpp:439) [88211][TimerWK1_ServerGTimer][T0][Y0-0000000000000000-0-0] [lt=266][errcode=-4721] nonblock get location failed(ret=-4721, ret="OB_LS_LOCATION_NOT_EXIST", cluster_id=1749099062, tenant_id=1, ls_id={id:1})
[2025-06-30 12:58:17.563264] WDIAG [SHARE.LOCATION] get_leader_with_retry_until_timeout (ob_location_service.cpp:117) [88211][TimerWK1_ServerGTimer][T0][Y0-0000000000000000-0-0] [lt=243][errcode=-4721] fail to get log stream location leader with retry until_timeout(ret=-4721, ret="OB_LS_LOCATION_NOT_EXIST", cluster_id=1749099062, tenant_id=1, ls_id={id:1}, leader="0.0.0.0:0", abs_retry_timeout=1751259497762251, retry_interval=200000)
[2025-06-30 12:58:17.563300] WDIAG [SERVER] nonblock_get_leader (ob_inner_sql_connection.cpp:1938) [88211][TimerWK1_ServerGTimer][T0][Y0-0000000000000000-0-0] [lt=112][errcode=-4721] get leader with retry until timeout failed(ret=-4721, ret="OB_LS_LOCATION_NOT_EXIST", tenant_id=1, ls_id={id:1}, leader="0.0.0.0:0", cluster_id=1749099062, tmp_abs_timeout_us=1751259497762251, retry_interval_us=200000)
[2025-06-30 12:58:17.563324] WDIAG [SHARE.SCHEMA] check_if_tenant_has_been_dropped (ob_multi_version_schema_service.cpp:1973) [88211][TimerWK1_ServerGTimer][T0][Y0-0000000000000000-0-0] [lt=64][errcode=-4006] local schema not inited,(ret=-4006, tenant_id=1)
[2025-06-30 12:58:17.563343] WDIAG [SERVER] nonblock_get_leader (ob_inner_sql_connection.cpp:1929) [88211][TimerWK1_ServerGTimer][T0][Y0-0000000000000000-0-0] [lt=50][errcode=0] user tenant has been dropped(ret=0, ret="OB_SUCCESS", tenant_id=1)
[2025-06-30 12:58:17.563361] WDIAG [SHARE.LOCATION] nonblock_get_leader (ob_ls_location_service.cpp:439) [88211][TimerWK1_ServerGTimer][T0][Y0-0000000000000000-0-0] [lt=53][errcode=-4721] nonblock get location failed(ret=-4721, ret="OB_LS_LOCATION_NOT_EXIST", cluster_id=1749099062, tenant_id=1, ls_id={id:1})
[2025-06-30 12:58:17.567805] WDIAG [SERVER.OMT] check_cgroup_root_dir (ob_cgroup_ctrl.cpp:212) [88477][MultiTenant][T0][Y0-0000000000000000-0-0] [lt=214][errcode=-4027] dir not exist(OBSERVER_ROOT_CGROUP_DIR="cgroup", ret=-4027)
[2025-06-30 12:58:17.577993] WDIAG [SERVER.OMT] check_cgroup_root_dir (ob_cgroup_ctrl.cpp:212) [88477][MultiTenant][T0][Y0-0000000000000000-0-0] [lt=214][errcode=-4027] dir not exist(OBSERVER_ROOT_CGROUP_DIR="cgroup", ret=-4027)
[2025-06-30 12:58:17.588203] WDIAG [SERVER.OMT] check_cgroup_root_dir (ob_cgroup_ctrl.cpp:212) [88477][MultiTenant][T0][Y0-0000000000000000-0-0] [lt=206][errcode=-4027] dir not exist(OBSERVER_ROOT_CGROUP_DIR="cgroup", ret=-4027)
[2025-06-30 12:58:17.598441] WDIAG [SERVER.OMT] check_cgroup_root_dir (ob_cgroup_ctrl.cpp:212) [88477][MultiTenant][T0][Y0-0000000000000000-0-0] [lt=209][errcode=-4027] dir not exist(OBSERVER_ROOT_CGROUP_DIR="cgroup", ret=-4027)
[2025-06-30 12:58:17.608660] WDIAG [SERVER.OMT] check_cgroup_root_dir (ob_cgroup_ctrl.cpp:212) [88477][MultiTenant][T0][Y0-0000000000000000-0-0] [lt=237][errcode=-4027] dir not exist(OBSERVER_ROOT_CGROUP_DIR="cgroup", ret=-4027)
[2025-06-30 12:58:17.618845] WDIAG [SERVER.OMT] check_cgroup_root_dir (ob_cgroup_ctrl.cpp:212) [88477][MultiTenant][T0][Y0-0000000000000000-0-0] [lt=211][errcode=-4027] dir not exist(OBSERVER_ROOT_CGROUP_DIR="cgroup", ret=-4027)
[2025-06-30 12:58:17.625692] INFO  destroy_tg (thread_mgr.cpp:89) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=45] destroy tg(tg_id=284, tg=0x1455642d7cb0, tg->attr_={name:StartupAccelHandler, type:4})
[2025-06-30 12:58:17.625796] EDIAG [SERVER] start (ob_server.cpp:1235) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=239][errcode=-4070] failure occurs, try to set stop and wait(ret=-4070, ret="OB_INVALID_DATA") BACKTRACE:0x9525fb6 0x90a6216 0x90a5779 0x90a5246 0x90a4ee4 0x90a4cdb 0x1183a106 0x11832df3 0xdafae6c 0x248583b0 0xdaf60bd 0x14556d5aac87 0x9832434
[2025-06-30 12:58:17.625975] ERROR [SERVER] start (ob_server.cpp:1239) [88183][observer][T0][Y0-0000000000000001-0-0] [lt=166][errcode=-4070] [server_start 10/18] observer start fail, the stop status is true. you may find solutions in previous error logs or seek help from official technicians.

observer.log (3.0 MB)

1 个赞

看着应该是clog文件损坏了

1 个赞

您数据库是不是使用的的目录 没有挂在啊 , 您df -h 看下 ,clog 或者data 目录是否都挂在无误

1 个赞

您好,我是采用白屏工具在宿主机部署的 ob,没有使用 docker 容器。数据盘和clog盘都是正常挂载的。

1 个赞

插眼

1 个赞

日志存在clog文件读取失败问题。应该是断电导致clog文件损坏了

我仔细看了一下,系统似乎存在两份 clog
ob 主目录在 /root/Oceanbase/nrlj/oceanbase 下
clog 设置的目录(redo_dir) 在 /data/Oceanbase 下

(nrlj_2.0) root@nrlj:~/OceanBase# ls /data/OceanBase/clog/
log_pool  tenant_1  tenant_1001  tenant_1002  tenant_1003  tenant_1004  tenant_1005  tenant_1006
(nrlj_2.0) root@nrlj:~/OceanBase# ls /root/OceanBase/nrlj/oceanbase/store/clog/
log_pool  tenant_1  tenant_1001  tenant_1002  tenant_1003  tenant_1004  tenant_1005  tenant_1006
(nrlj_2.0) root@nrlj:~/OceanBase# du -sh /data/OceanBase/clog/
du: cannot access '/data/OceanBase/clog/log_pool/7028': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7029': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7030': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7031': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7032': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7033': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7034': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7035': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7036': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7037': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7038': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7039': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7040': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7041': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7042': No such file or directory
du: cannot access '/data/OceanBase/clog/log_pool/7043': No such file or directory
200G	/data/OceanBase/clog/
(nrlj_2.0) root@nrlj:~/OceanBase# du -sh /root/OceanBase/nrlj/oceanbase/store/clog/
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7028': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7029': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7030': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7031': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7032': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7033': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7034': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7035': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7036': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7037': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7038': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7039': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7040': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7041': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7042': No such file or directory
du: cannot access '/root/OceanBase/nrlj/oceanbase/store/clog/log_pool/7043': No such file or directory
200G	/root/OceanBase/nrlj/oceanbase/store/clog/

obd cluster edit-config nrlj 查看集群配置

oceanbase-ce:
  version: 4.3.5.1
  release: 101000042025031818.el7
  package_hash: 8826bc816ae660198f9ca5fd7e96d93c1ce4fc84
  172.31.157.2:
    zone: zone1
  servers:
  - 172.31.157.2
  global:
    appname: nrlj
    root_password: AAdmin123++
    mysql_port: 2881
    rpc_port: 2882
    redo_dir: /data/OceanBase
    home_path: /root/OceanBase/nrlj/oceanbase
    scenario: htap
    log_disk_size: 200GB
    max_syslog_file_count: '14'
    memory_limit: 35GB
    cpu_count: 16
    cluster_id: 1749099062
    ocp_agent_monitor_password: PNrRtsQHWE
    proxyro_password: v2Tzv707lI
    enable_syslog_wf: false
    datafile_size: 105G
    system_memory: 6G
    datafile_maxsize: 619G
    datafile_next: 62G
  depends:
  - ob-configserver

不知道为啥会有两个 clog, 不知道这会不会导致找不到 clog 呢?

参数文件redo_dir是clog和slog文件路径
home_path: /root/OceanBase/nrlj/oceanbase/log是observer.log的路径

1 个赞

使用obdiag分析下

(nrlj_2.0) root@nrlj:/data/OceanBase/clog/log_pool# ls -alh
ls: cannot access ‘7028’: No such file or directory
ls: cannot access ‘7029’: No such file or directory
ls: cannot access ‘7030’: No such file or directory
ls: cannot access ‘7031’: No such file or directory
ls: cannot access ‘7032’: No such file or directory
ls: cannot access ‘7033’: No such file or directory
ls: cannot access ‘7034’: No such file or directory
ls: cannot access ‘7035’: No such file or directory
ls: cannot access ‘7036’: No such file or directory
ls: cannot access ‘7037’: No such file or directory
ls: cannot access ‘7038’: No such file or directory
ls: cannot access ‘7039’: No such file or directory
ls: cannot access ‘7040’: No such file or directory
ls: cannot access ‘7041’: No such file or directory
ls: cannot access ‘7042’: No such file or directory
ls: cannot access ‘7043’: No such file or directory
total 12K
drwxr–r-- 2 root root 48K Jul 2 11:43 .
drwxr-xr-x 3 root root 22 Jul 2 11:43 …
-??? ? ? ? ? ? 7028
-??? ? ? ? ? ? 7029
-??? ? ? ? ? ? 7030
-??? ? ? ? ? ? 7031
-??? ? ? ? ? ? 7032
-??? ? ? ? ? ? 7033
-??? ? ? ? ? ? 7034
-??? ? ? ? ? ? 7035
-??? ? ? ? ? ? 7036
-??? ? ? ? ? ? 7037
-??? ? ? ? ? ? 7038
-??? ? ? ? ? ? 7039
-??? ? ? ? ? ? 7040
-??? ? ? ? ? ? 7041
-??? ? ? ? ? ? 7042
-??? ? ? ? ? ? 7043

看来应该就是文件系统损坏了