重建实例后,observer启动失败

【产品名称】oceanbase

【产品版本】3.1.1

【问题描述】

之前有个集群环境,通过kill -9 + rm observer目录销毁;

第二次启主节点observer失败

[2021-12-17 02:21:32.920757] INFO [LIB] ob_slice_alloc.h:322 [32667][0][Y0-0000000000000000] [lt=14] [dc=0] ObSliceAlloc init finished(bsize_=8192, isize_=56, slice_limit_=7408, tmallocator_=NULL) [2021-12-17 02:21:32.920906] INFO ob_page_manager.cpp:34 [33096][0][Y0-0000000000000000] [lt=0] [dc=0] register pm finish(ret=0, &pm=0xfffaf2220100, pm.get_tid()=33096, tenant_id=500) [2021-12-17 02:21:32.920913] INFO [STORAGE] ob_local_file_system.cpp:405 [32667][0][Y0-0000000000000000] [lt=10] [dc=0] finish open store file(ret=0, fd_={fd:786, disk_id:{disk_idx:0, install_seq:0}}, store_path_="/data/ob/observer01/store/sstable/block_file", data_file_size_=139586437120) [2021-12-17 02:21:32.920919] INFO [LIB] ob_tsi_factory.h:366 [33096][0][Y0-0000000000000000] [lt=11] [dc=0] new instance succ [N9oceanbase3lib13MemoryContextE] 0xfffbd5fa2950 size=24, tsi=0xfffbecb4ceb0 [2021-12-17 02:21:32.921010] WARN [STORAGE] inner_get_super_block_version (ob_local_file_system.cpp:875) [32667][0][Y0-0000000000000000] [lt=5] [dc=0] read superblock error.(ret=-4009, offset=0, read_size=0, errno=2, super_block_buf_holder={buf:0xfffb8f35c000, len:65536}, fd={fd:786, disk_id:{disk_idx:0, install_seq:0}}, errmsg=“No such file or directory”) [2021-12-17 02:21:32.921021] INFO [STORAGE] ob_local_file_system.cpp:880 [32667][0][Y0-0000000000000000] [lt=10] [dc=0] finish read_super_block_header_version(ret=-4009, offset=0, version=0) [2021-12-17 02:21:32.921025] WARN [STORAGE] get_super_block_version (ob_local_file_system.cpp:849) [32667][0][Y0-0000000000000000] [lt=4] [dc=0] fail to get super block version from master(ret=-4009) [2021-12-17 02:21:32.921035] WARN [STORAGE] inner_get_super_block_version (ob_local_file_system.cpp:875) [32667][0][Y0-0000000000000000] [lt=3] [dc=0] read superblock error.(ret=-4009, offset=2097152, read_size=0, errno=2, super_block_buf_holder={buf:0xfffb8f35c000, len:65536}, fd={fd:786, disk_id:{disk_idx:0, install_seq:0}}, errmsg=“No such file or directory”) [2021-12-17 02:21:32.921040] INFO [STORAGE] ob_local_file_system.cpp:880 [32667][0][Y0-0000000000000000] [lt=4] [dc=0] finish read_super_block_header_version(ret=-4009, offset=2097152, version=0) [2021-12-17 02:21:32.921043] WARN [STORAGE] get_super_block_version (ob_local_file_system.cpp:851) [32667][0][Y0-0000000000000000] [lt=3] [dc=0] fail to get super block version from backup(ret=-4009)

附件是observer启动到报错退出的日志

observer.zip (53458 KB)

稍等一下, 正在看

  1. 销毁集群建议使用:
    obd cluster destroy

https://github.com/oceanbase/obdeploy/blob/master/docs/docs-cn/obd-commands/cluster-commands.md

你这次的问题,我怀疑是删除的时候目录没有清理干净,一个可能的原因是,只删除了observer目录下,但是没有删除/data/ob/observer01/store,导致重建之后又使用上一次的sstable store,从而导致校验报错


没有obd,arm平台

我是直接把observer01整个目录rm的

步骤

1、kill -9 observer

2、rm -rf observer01

3、

su - admin

mkdir -p /data/ob/observer01/store/{sort_dir,sstable,clog,ilog,slog}

4、root启动observer

5、过几秒,报错退出

现在就差重启主机了,我想知道除了observer01,ob还可能在哪写文件?

P.S:admin下的.mysql.history也删了

[2021-12-17 02:29:05.191495] INFO [STORAGE] ob_local_file_system.cpp:405 [67916][0][Y0-0000000000000000] [lt=8] [dc=0] finish open store file(ret=0, fd_={fd:786, disk_id:{disk_idx:0, install_seq:0}}, store_path_="/data/ob/observer01/store/sstable/block_file", data_file_size_=139586437120)

这条日志是打开了"/data/ob/observer01/store/sstable/block_file"文件的fd,786

[2021-12-17 02:29:05.191589] WARN [STORAGE] inner_get_super_block_version (ob_local_file_system.cpp:875) [67916][0][Y0-0000000000000000] [lt=9] [dc=0] read superblock error.(ret=-4009, offset=0, read_size=0, errno=2, super_block_buf_holder={buf:0xfffde8fac000, len:65536}, fd={fd:786, disk_id:{disk_idx:0, install_seq:0}}, errmsg="No such file or directory")

但是后面做pread时,报了文件不存在的errno = 2。

从read接口的man手册里,似乎是不会返回ENOENT。请问下操作系统是什么版本?

/data/ob/observer01/store/{sort_dir,sstable,clog,ilog,slog} 这些都需要删除。

直接 rm -rf observer01 删除的是observer01这个目录。按照通常的部署模式,这个目录下的store是软链接到其他目录的,你确认下删除的时候是不是只删除了软链接,没有删除实际的数据

stat /data/ob/observer01/store/sstable/block_file 确认下这个文件的创建时间。


Kylin V10 (ARM)

[root@ecs0001 ob]# stat /data/ob/observer01/store/sstable/block_file
  File: /data/ob/observer01/store/sstable/block_file
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: fd11h/64785d    Inode: 786453      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-12-17 02:15:36.784167016 +0800
Modify: 2021-12-17 02:15:36.784167016 +0800
Change: 2021-12-17 02:15:36.784167016 +0800
 Birth: 2021-12-17 02:15:36.784167016 +0800

刚刚试着把observer01删了重做,貌似又正常了...

[root@ecs0001 log]# cd /data
[root@ecs0001 data]# cd ob
[root@ecs0001 ob]# ls
observer01  oceanbase
[root@ecs0001 ob]# rm -rf observer01/
[root@ecs0001 ob]# ls
oceanbase
[root@ecs0001 ob]# stat /data/ob/observer01/store/sstable/block_file
stat: cannot stat '/data/ob/observer01/store/sstable/block_file': No such file or directory
[root@ecs0001 ob]
[root@ecs0001 ob]# su - admin
Last login: Fri Dec 17 14:45:54 CST 2021 on pts/5
[admin@ecs0001 ~]$ 
[admin@ecs0001 ~]$ 
[admin@ecs0001 ~]$ mkdir -p /data/ob/observer01/store/{sort_dir,sstable,clog,ilog,slog}
[root@ecs0001 ob]# cd /data/ob/observer01 && /data/ob/oceanbase/build_release/src/observer/observer -r 192.168.0.187:2882:2881 -o __min_full_resource_pool_memory=268435456,memory_limit=35G,system_memory=4G,stack_size=512K,cpu_count=16,cache_wash_threshold=1G,workers_per_cpu_quota=10,schema_history_expire_time=1d,net_thread_count=4,sys_bkgd_migration_retry_num=3,minor_freeze_times=10,enable_separate_sys_clog=0,enable_merge_by_turn=False,datafile_size=120G,enable_syslog_recycle=True,max_syslog_file_count=4 -z zone1 -p 2881 -P 2882 -c 1 -d /data/ob/observer01/store -i eth0 -l INFO
/data/ob/oceanbase/build_release/src/observer/observer -r 192.168.0.187:2882:2881 -o __min_full_resource_pool_memory=268435456,memory_limit=35G,system_memory=4G,stack_size=512K,cpu_count=16,cache_wash_threshold=1G,workers_per_cpu_quota=10,schema_history_expire_time=1d,net_thread_count=4,sys_bkgd_migration_retry_num=3,minor_freeze_times=10,enable_separate_sys_clog=0,enable_merge_by_turn=False,datafile_size=120G,enable_syslog_recycle=True,max_syslog_file_count=4 -z zone1 -p 2881 -P 2882 -c 1 -d /data/ob/observer01/store -i eth0 -l INFO
rs list: 192.168.0.187:2882:2881
optstr: __min_full_resource_pool_memory=268435456,memory_limit=35G,system_memory=4G,stack_size=512K,cpu_count=16,cache_wash_threshold=1G,workers_per_cpu_quota=10,schema_history_expire_time=1d,net_thread_count=4,sys_bkgd_migration_retry_num=3,minor_freeze_times=10,enable_separate_sys_clog=0,enable_merge_by_turn=False,datafile_size=120G,enable_syslog_recycle=True,max_syslog_file_count=4
zone: zone1
mysql port: 2881
rpc port: 2882
cluster id: 1
data_dir: /data/ob/observer01/store
devname: eth0
log level: INFO
[root@ecs0001 observer01]# 
[root@ecs0001 observer01]# 
[root@ecs0001 observer01]# ps -ef|grep obse
root     596483      1 69 15:03 ?        00:00:03 /data/ob/oceanbase/build_release/src/observer/observer -r 192.168.0.187:2882:2881 -o __min_full_resource_pool_memory=268435456,memory_limit=35G,system_memory=4G,stack_size=512K,cpu_count=16,cache_wash_threshold=1G,workers_per_cpu_quota=10,schema_history_expire_time=1d,net_thread_count=4,sys_bkgd_migration_retry_num=3,minor_freeze_times=10,enable_separate_sys_clog=0,enable_merge_by_turn=False,datafile_size=120G,enable_syslog_recycle=True,max_syslog_file_count=4 -z zone1 -p 2881 -P 2882 -c 1 -d /data/ob/observer01/store -i eth0 -l INFO
root     598415  96975  0 15:03 pts/5    00:00:00 grep obse