3节点OB集群bootstrap一直不成功

【产品名称】 3节点observer

【产品版本】 2.2.76

【问题描述】3节点OB集群bootstrap一直不成功

1:现象

运行了50分时还未有结果。

MySQL [(none)]> set session ob_query_timeout=36000000000;

set session ob_trx_timeout=36000000000;Query OK, 0 rows affected (0.00 sec)

MySQL [(none)]> set session ob_trx_timeout=36000000000;

Query OK, 0 rows affected (0.00 sec)

MySQL [(none)]>

MySQL [(none)]>

MySQL [(none)]> alter system bootstrap zone ‘zone1’ server ‘10.10.10.160:2882’, zone ‘zone2’ server ‘10.10.10.161:2882’, zone ‘zone3’ server ‘10.10.10.162:2882’;

2,信息收集

grep ERROR observer.log*

未发现有ERROR的错误。

dstat 3台服务器,资源使用率都很多。

3,操作不走

分别在三台服务器上执行

mkdir -p /data/1/htzob/{etc3,sort_dir,sstable}

mkdir -p /data/log1/htzob/{clog,etc2,ilog,slog,oob_clog}

mkdir -p /home/admin/oceanbase/store/htzob

for t in {etc3,sort_dir,sstable};do ln -s /data/1/htzob/$t /home/admin/oceanbase/store/htzob/$t; done

for t in {clog,etc2,ilog,slog,oob_clog};do ln -s /data/log1/htzob/$t /home/admin/oceanbase/store/htzob/$t; done

分别执行下面命令;

节点1

cd /home/admin/oceanbase && /home/admin/oceanbase/bin/observer -i ens192 -P 2882 -p 2881 -z zone1 -d /home/admin/oceanbase/store/htzob -r ‘10.10.10.160:2882:2881;10.10.10.161:2882:2881;10.10.10.162:2882:2881’ -n htzob -o “cpu_count=6,memory_limit=16G,datafile_disk_percentage=50,cluster_id=100001,config_additional_dir=/data/1/htzob/etc3;/data/log1/htzob/etc2,__min_full_resource_pool_memory=1073741824,_ob_enable_prepared_statement=false,memory_limit_percentage=90,system_memory=10G”

节点2

cd /home/admin/oceanbase && /home/admin/oceanbase/bin/observer -i ens192 -P 2882 -p 2881 -z zone2 -d /home/admin/oceanbase/store/htzob -r ‘10.10.10.160:2882:2881;10.10.10.161:2882:2881;10.10.10.162:2882:2881’ -n htzob -o “cpu_count=6,memory_limit=16G,datafile_disk_percentage=50,cluster_id=100001,config_additional_dir=/data/1/htzob/etc3;/data/log1/htzob/etc2,__min_full_resource_pool_memory=1073741824,_ob_enable_prepared_statement=false,memory_limit_percentage=90,system_memory=10G”

节点3

cd /home/admin/oceanbase && /home/admin/oceanbase/bin/observer -i ens192 -P 2882 -p 2881 -z zone3 -d /home/admin/oceanbase/store/htzob -r ‘10.10.10.160:2882:2881;10.10.10.161:2882:2881;10.10.10.162:2882:2881’ -n htzob -o “cpu_count=6,memory_limit=16G,datafile_disk_percentage=50,cluster_id=100001,config_additional_dir=/data/1/htzob/etc3;/data/log1/htzob/etc2,__min_full_resource_pool_memory=1073741824,_ob_enable_prepared_statement=false,memory_limit_percentage=90,system_memory=10G”

已经配置NTP服务,时间同步信息如下:

[root@obone ~]# clockdiff 10.10.10.161

..

host=10.10.10.161 rtt=562(280)ms/0ms delta=2ms/2ms Tue Aug 10 18:27:37 2021

[root@obone ~]# clockdiff 10.10.10.162

.

host=10.10.10.162 rtt=750(187)ms/0ms delta=0ms/0ms Tue Aug 10 18:27:38 2021


bootstrap过程中,主机资源使用率极低:

[root@obone ~]# dstat

You did not select any stats, using -cdngy by default.

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--

usr sys idl wai hiq siq| read writ| recv send| in out | int csw 

 6 3 90 0 0 0| 60k 34M| 0  0 | 0  0 | 34k 93k

 9 4 87 0 0 0| 0  0 | 35k 27k| 0  0 | 41k 112k

 8 4 88 0 0 0| 0  0 | 49k 32k| 0  0 | 40k 110k

 7 4 89 0 0 0| 0  16k| 42k 25k| 0  0 | 40k 108k

 8 4 88 0 0 0| 0  0 | 30k 22k| 0  0 | 41k 110k

 7 3 89 0 0 0| 0 4060k| 39k 32k| 0  0 | 41k 111k



内存和CPU配置信息如下:

[root@obone ~]# lscpu

Architecture:     x86_64

CPU op-mode(s):    32-bit, 64-bit

Byte Order:      Little Endian

CPU(s):        10

On-line CPU(s) list: 0-9

Thread(s) per core:  1

Core(s) per socket:  1

Socket(s):      10

NUMA node(s):     1

Vendor ID:      GenuineIntel

CPU family:      6

Model:        63

Model name:      Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz

Stepping:       2

CPU MHz:       2499.998

BogoMIPS:       4999.99

Hypervisor vendor:  VMware

Virtualization type: full

L1d cache:      32K

L1i cache:      32K

L2 cache:       256K

L3 cache:       30720K

NUMA node0 CPU(s):  0-9

Flags:        fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid xsaveopt arat md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities

[root@obone ~]# free -m

       total    used    free   shared buff/cache available

Mem:     32011    6574   20487     8    4950   19757

Swap:      0     0     0


网卡的配置信息:

[root@obone ~]# ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

  inet 127.0.0.1/8 scope host lo

    valid_lft forever preferred_lft forever

  inet6 ::1/128 scope host 

    valid_lft forever preferred_lft forever

2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000

  link/ether 00:0c:29:05:04:07 brd ff:ff:ff:ff:ff:ff

  inet 10.10.10.160/24 brd 10.10.10.255 scope global noprefixroute ens192

    valid_lft forever preferred_lft forever

  inet6 fe80::20c:29ff:fe05:407/64 scope link 

    valid_lft forever preferred_lft forever


连oceanbase数据库都没有创建成功。

所以就谈不上其它的表了。



1 个赞

给人的感觉就是会话发起后就hang住了,不知道有什么地方是否可以查看会话的状态和进度。


通过

[admin@obone log]$ strace -p 18778

strace: Process 18778 attached

read(3, 

可以认为这个进程应该是hang了在等待什么内容。


读的3是一个sock文件,感觉应该是在多节点之前通行导致的。

[root@obone fd]# ls -lrt

total 0

lrwx------ 1 admin admin 64 Aug 10 23:28 2 -> /dev/pts/0

lrwx------ 1 admin admin 64 Aug 10 23:33 3 -> socket:[2085520]

lrwx------ 1 admin admin 64 Aug 10 23:33 1 -> /dev/pts/0

lrwx------ 1 admin admin 64 Aug 10 23:33 0 -> /dev/pts/0


运行下面命令:

tree /data/

tree /home/admin/oceanbase/store

机器内存多大? free -g 。

如果内存不大,改变启动参数

-o __min_full_resource_pool_memory=268435456,memory_limit=16G,system_memory=8G,stack_size=512K,cpu_count=16,cache_wash_threshold=1G,workers_per_cpu_quota=10,schema_history_expire_time=1d,net_thread_count=4,sys_bkgd_migration_retry_num=3,minor_freeze_times=10,enable_separate_sys_clog=0,enable_merge_by_turn=False,datafile_size=50G,enable_syslog_recycle=True,max_syslog_file_count=10

[root@obone ~]# tree /data

/data

├── 1

│ └── htzob

│   ├── etc3

│   │ ├── observer.conf.bin

│   │ └── observer.conf.bin.history

│   ├── sort_dir

│   └── sstable

│     └── block_file

├── log1

│ └── htzob

│   ├── clog

│   │ ├── 1

│   │ ├── 2

│   │ ├── 3

│   │ ├── 4

│   │ ├── 5

│   │ ├── 6

│   │ └── 7

│   ├── etc2

│   │ ├── observer.conf.bin

│   │ └── observer.conf.bin.history

│   ├── ilog

│   ├── oob_clog

│   └── slog

│     └── 1

├── lost+found

└── soft

  └── oceanbase-2.2.76-20210325220637.el7.x86_64.rpm


14 directories, 14 files

[root@obone ~]# tree /home/admin/oceanbase/store

/home/admin/oceanbase/store

└── htzob

  ├── clog -> /data/log1/htzob/clog

  ├── clog_shm

  ├── etc2 -> /data/log1/htzob/etc2

  ├── etc3 -> /data/1/htzob/etc3

  ├── ilog -> /data/log1/htzob/ilog

  ├── ilog_shm

  ├── oob_clog -> /data/log1/htzob/oob_clog

  ├── slog -> /data/log1/htzob/slog

  ├── sort_dir -> /data/1/htzob/sort_dir

  └── sstable -> /data/1/htzob/sstable


内存为32G。目前使用下面的启动参数:

cpu_count=10,memory_limit=16G,datafile_disk_percentage=50,cluster_id=100001,config_additional_dir=/data/1/htzob/etc3;/data/log1/htzob/etc2,__min_full_resource_pool_memory=1073741824,_ob_enable_prepared_statement=false,memory_limit_percentage=90,system_memory=10G"

如果只初始化一个zone是没有问题的。