obd 部署三节点物理服务器测试集群,cluster bootstrap 一直等待。

【产品名称】

oceanbase-ce-3.1.2-7fafba0fac1e90cbd1b5b7ae5fa129b64dc63aed

【产品版本】

3.1.2

【问题描述】

采用obd部署三台物理服务器,CPU 64core 内存128GB,三台物理服务器均配置完毕免密登录、防火墙、selinux、时钟同步,检查如下:

[admin@fompdb1 ~]$ ntpstat

synchronised to local net (127.127.1.0) at stratum 9

time correct to within 11 ms

polling server every 64 s

[admin@fompdb1 ~]$ ssh fompdb2 ntpstat

synchronised to NTP server (192.168.5.140) at stratum 10

time correct to within 45 ms

polling server every 1024 s

[admin@fompdb1 ~]$ ssh fompdb3 ntpstat

synchronised to NTP server (192.168.5.140) at stratum 10

time correct to within 39 ms

polling server every 1024 s

obd配置文件如下:

user:

username:admin

password:admin-.2022

oceanbase-ce:

servers:

  • name: server1

Please don’t use hostname, only IP can be supported

ip: 192.168.5.140

  • name: server2

Please don’t use hostname, only IP can be supported

ip: 192.168.5.141

  • name: server3

Please don’t use hostname, only IP can be supported

ip: 192.168.5.142

global:

Please set devname as the network adaptor’s name whose ip is in the setting of severs.

if set severs as “127.0.0.1”, please set devname as “lo”

if current ip is 192.168.1.10, and the ip’s network adaptor’s name is “eth0”, please use “eth0”

devname: eno1

cluster_id: 1

please set memory limit to a suitable value which is matching resource.

memory_limit: 48G # The maximum running memory for an observer

system_memory: 4G # The reserved system memory. system_memory is reserved for general tenants. The default value is 30G.

__min_full_resource_pool_memory: 268435456

stack_size: 512K

cpu_count: 24

cache_wash_threshold: 1G

workers_per_cpu_quota: 10

schema_history_expire_time: 1d

The value of net_thread_count had better be same as cpu’s core number.

net_thread_count: 24

major_freeze_duty_time: Disable

minor_freeze_times: 10

enable_separate_sys_clog: 0

enable_merge_by_turn: FALSE

datafile_disk_percentage: 10 # The percentage of the data_dir space to the total disk space. This value takes effect only when datafile_size is 0. The default value is 90.

syslog_level: WARN # System log level. The default value is INFO.

enable_syslog_wf: false # Print system logs whose levels are higher than WARNING to a separate log file. The default value is true.

enable_syslog_recycle: true # Enable auto system log recycling or not. The default value is false.

max_syslog_file_count: 4 # The maximum number of reserved log files before enabling auto recycling. The default value is 0.

root_password: # root user password, can be empty

server1:

mysql_port: 2881 # External port for OceanBase Database. The default value is 2881. DO NOT change this value after the cluster is started.

rpc_port: 2882 # Internal port for OceanBase Database. The default value is 2882. DO NOT change this value after the cluster is started.

home_path: /home/admin

data_dir: /data

redo_dir: /redo

zone: zone1

server2:

mysql_port: 2881

rpc_port: 2882

home_path: /home/admin

data_dir: /data

redo_dir: /redo

zone: zone2

server3:

mysql_port: 2881

rpc_port: 2882

home_path: /home/admin

data_dir: /data

redo_dir: /redo

zone: zone3

obproxy:

depends:

  • oceanbase-ce

servers:

  • 192.168.5.140
  • 192.168.5.141
  • 192.168.5.142

global:

home_path: /home/admin/oceanbase

rs_list: 192.168.5.140:2881;192.168.5.141:2881;192.168.5.142:2881

 grep STEP_ observer.log*

observer.log.20220119132952:[2022-01-19 13:24:30.299123] INFO [BOOTSTRAP] ob_bootstrap.cpp:636 [16307][1456][YB42C0A8058C-0005D5E899A8854C] [lt=52] [dc=0] STEP_2.1:check_all_server_bootstrap_mode_match execute success, cost=26310

observer.log.20220119132952:[2022-01-19 13:24:30.373654] INFO [BOOTSTRAP] ob_bootstrap.cpp:666 [16307][1456][YB42C0A8058C-0005D5E899A8854C] [lt=42] [dc=0] STEP_2.2:check_is_all_server_empty execute success, cost=74535

observer.log.20220119132952:[2022-01-19 13:24:30.431708] INFO [BOOTSTRAP] ob_bootstrap.cpp:531 [16307][1456][YB42C0A8058C-0005D5E899A8854C] [lt=39] [dc=0] STEP_2.3:notify_sys_tenant_server_unit_resource execute success, cost=58050

observer.log.20220119132952:[2022-01-19 13:24:30.765805] INFO [BOOTSTRAP] ob_bootstrap.cpp:611 [16307][1456][YB42C0A8058C-0005D5E899A8854C] [lt=9] [dc=0] STEP_2.4:create_partition execute success, cost=334102

observer.log.20220119132952:[2022-01-19 13:24:32.476698] INFO [BOOTSTRAP] ob_bootstrap.cpp:685 [16307][1456][YB42C0A8058C-0005D5E899A8854C] [lt=28] [dc=0] STEP_2.5:wait_elect_master_partition execute success, cost=1710893

observer.log.20220119132952:[2022-01-19 13:24:32.476723] INFO [BOOTSTRAP] ob_bootstrap.cpp:336 [16307][1456][YB42C0A8058C-0005D5E899A8854C] [lt=23] [dc=0] STEP_2.6:prepare_bootstrap execute success, cost=25

同时,日志里面大量的类似操作系统字典的WARN:

[2022-01-19 13:38:49.435196] WARN [RS] check_server_empty (ob_unit_manager.cpp:3177) [15893][634][Y0-0000000000000000] [lt=9] [dc=0] check_inner_stat failed(inited=true, loaded=false, ret=-4014)

[2022-01-19 13:38:49.435224] WARN [RS] construct_not_empty_server_set (ob_server_manager.cpp:1499) [15893][634][Y0-0000000000000000] [lt=23] [dc=0] fail to check server empty(ret=-4014, server="192.168.5.140:2882")

[2022-01-19 13:38:49.435240] WARN [RS] check_servers (ob_server_manager.cpp:1526) [15893][634][Y0-0000000000000000] [lt=11] [dc=0] fail to construct empty server set(ret=-4014)

[2022-01-19 13:38:49.435283] WARN [RS] run3 (ob_server_manager.cpp:2622) [15893][634][Y0-0000000000000000] [lt=8] [dc=0] server managers check servers failed(ret=-4014)

[2022-01-19 13:38:49.535508] WARN [RS] check_server_empty (ob_unit_manager.cpp:3177) [15893][634][Y0-0000000000000000] [lt=8] [dc=0] check_inner_stat failed(inited=true, loaded=false, ret=-4014)

[2022-01-19 13:38:49.535544] WARN [RS] construct_not_empty_server_set (ob_server_manager.cpp:1499) [15893][634][Y0-0000000000000000] [lt=30] [dc=0] fail to check server empty(ret=-4014, server="192.168.5.140:2882")

[2022-01-19 13:38:49.535560] WARN [RS] check_servers (ob_server_manager.cpp:1526) [15893][634][Y0-0000000000000000] [lt=11] [dc=0] fail to construct empty server set(ret=-4014)

[2022-01-19 13:38:49.535610] WARN [RS] run3 (ob_server_manager.cpp:2622) [15893][634][Y0-0000000000000000] [lt=9] [dc=0] server managers check servers failed(ret=-4014)

obd 执行 autodeply 后一直等待 cluster bootstrap , 等待事件过程,遂ctrl+c取消,取消后磁盘空间占用如下:

PS:磁盘空间应该是足够的

[admin@fompdb1 log]$ df -h

文件系统         容量 已用 可用 已用% 挂载点

/dev/mapper/centos-root  50G 8.2G  42G  17% /

devtmpfs         63G   0  63G  0% /dev

tmpfs           63G   0  63G  0% /dev/shm

tmpfs           63G  20M  63G  1% /run

tmpfs           63G   0  63G  0% /sys/fs/cgroup

/dev/sda2        1014M 162M 853M  16% /boot

/dev/sda1        200M 9.8M 191M  5% /boot/efi

/dev/mapper/centos-home 459G 1.7G 434G  1% /home

tmpfs           13G  12K  13G  1% /run/user/989

/dev/mapper/centos-data 1008G 766G 192G  80% /data

/dev/mapper/centos-redo 296G 129M 281G  1% /redo

tmpfs           13G   0  13G  0% /run/user/0

tmpfs           13G   0  13G  0% /run/user/1000

你现在运行ok了

问题已解决。时钟同步出了问题,把ntp禁用,然后换成chrony就可以了。