手工安装OB 社区版 V3.1.5_CE_HF2 版本初始化报错

【 使用环境 】测试环境
【 OB 】
无法对集群进行初始化。

【 使用版本 】
V3.1.5_CE_HF2
【问题描述】清晰明确描述问题
为了参加OBCP V3考试,我在测试环境手工部署了集群。

我在三台服务器上分别通过如下方式启动了服务:
cd ~/oceanbase && bin/observer -i p5p1 -p 2881 -P 2882 -z zone1 -d ~/oceanbase/store/obcluster -r ‘10.110.7.40:2882:2881;10.110.7.41:2882:2881;10.110.7.42:2882:2881’ -c 20240205 -n obcluster -o “memory_limit=8G,cache_wash_threshold=1G,__min_full_resource_pool_memory=268435456,system_memory=3G,memory_chunk_cache_size=128M,cpu_count=12,net_thread_count=4,datafile_size=50G,stack_size=1536K,config_additional_dir=/data/obcluster/etc3;/redo/obcluster/etc2”

cd ~/oceanbase && bin/observer -i p5p1 -p 2881 -P 2882 -z zone2 -d ~/oceanbase/store/obcluster -r ‘10.110.7.40:2882:2881;10.110.7.41:2882:2881;10.110.7.42:2882:2881’ -c 20240205 -n obcluster -o “memory_limit=8G,cache_wash_threshold=1G,__min_full_resource_pool_memory=268435456,system_memory=3G,memory_chunk_cache_size=128M,cpu_count=12,net_thread_count=4,datafile_size=50G,stack_size=1536K,config_additional_dir=/data/obcluster/etc3;/redo/obcluster/etc2”

cd ~/oceanbase && bin/observer -i p5p1 -p 2881 -P 2882 -z zone3 -d ~/oceanbase/store/obcluster -r ‘10.110.7.40:2882:2881;10.110.7.41:2882:2881;10.110.7.42:2882:2881’ -c 20240205 -n obcluster -o “memory_limit=8G,cache_wash_threshold=1G,__min_full_resource_pool_memory=268435456,system_memory=3G,memory_chunk_cache_size=128M,cpu_count=12,net_thread_count=4,datafile_size=50G,stack_size=1536K,config_additional_dir=/data/obcluster/etc3;/redo/obcluster/etc2”

使用mysql -h 10.110.7.41 -u root -P 2881 -p -c -A 登录数据库进行初始化:
报错,报错信息如下:
[admin@xsky-node2 ~]$ mysql -h 10.110.7.40 -u root -P 2881 -p -c -A
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 3221225473
Server version: 5.7.25 OceanBase 3.1.5 (r100020022023091114-8a9dc4b356d043b494015503d6d91f876486fbed) (Built Sep 11 2023 14:38:53)

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

MySQL [(none)] > set session ob_query_timeout=1000000000;alter system bootstrap ZONE ‘zone1’ SERVER ‘10.110.7.40:2882’,ZONE ‘zone2’ SERVER ‘10.110.7.41:2882’, ZONE ‘zone3’ SERVER ‘10.110.7.42:2882’;
Query OK, 0 rows affected (0.00 sec)

ERROR 4015 (HY000): System error
不知道什么原因,该如何处理。

另外就是我对应V3.1.5_CE_HF2 的OB,我的obproxy该安装哪个版本更好。
如有问题请联系: 18701685580
【复现路径】问题出现前后相关操作
【附件及日志】

【SOP系列 22 】——故障诊断第一步(系统巡检和诊断信息收集)

可以参考下这个帖子 [手动部署ob集群,初始化时报错ERROR 4015 (HY000): System error - 社区问答- OceanBase社区-分布式数据库

您好:
我看了你说的哪个帖子,也按照你说的调整了下参数但依然报错。

一、先看下我服务器配置
1) 内存
[admin@xsky-node2 oceanbase]$ free -g
total used free shared buff/cache available
Mem: 62 3 55 3 3 50
Swap: 63 0 63

2) CPU (物理和逻辑)
[root@xsky-node2 tmp]# cat /proc/cpuinfo| grep “physical id”| sort| uniq| wc -l
1
[root@xsky-node2 tmp]# cat /proc/cpuinfo| grep “physical id”| sort| uniq| wc -l
1
[root@xsky-node2 tmp]# cat /proc/cpuinfo| grep “cpu cores”| uniq
cpu cores : 8

[root@xsky-node2 tmp]# cat /proc/cpuinfo| grep “processor”| wc -l
16

我在启动时,调整了 cpu_count参数到 16
[admin@xsky-node2 oceanbase]$ ps -ef|grep observer
admin 6275 1 98 09:31 ? 00:01:05 bin/observer -i p5p1 -p 2881 -P 2882 -z zone1 -d /home/admin/oceanbase/store/obcluster -r 10.110.7.40:2882:2881;10.110.7.41:2882:2881;10.110.7.42:2882:2881 -c 20240205 -n obcluster -o memory_limit=8G,cache_wash_threshold=1G,__min_full_resource_pool_memory=268435456,system_memory=3G,memory_chunk_cache_size=128M,cpu_count=16,net_thread_count=4,datafile_size=30G,stack_size=1536K,config_additional_dir=/data/obcluster/etc3;/redo/obcluster/etc2

我在三台服务器分别启动了observer服务
然后执行修改 alter system bootstrap ZONE ‘zone1’ SERVER ‘10.110.7.40:2882’,ZONE ‘zone2’ SERVER ‘10.110.7.41:2882’,ZONE ‘zone3’ SERVER ‘10.110.7.42:2882’ ; 依然报错

另外我这台服务器没有启动docker

可以直接看看observer.log的日志,报错是什么

我这次重新将进程都杀了,把目录删了重建的文件,执行了
cd ~/oceanbase && bin/observer -i p5p1 -p 2881 -P 2882 -z zone3 -d /home/admin/oceanbase/store/obcluster -r ‘10.110.7.40:2882:2881;10.110.7.41:2882:2881;10.110.7.42:2882:2881’ -c 20240206 -n obcluster -o “memory_limit=30G,cache_wash_threshold=3G,__min_full_resource_pool_memory=268435456,system_memory=6G,memory_chunk_cache_size=128M,cpu_count=12,net_thread_count=4,datafile_size=30G,stack_size=1536K,config_additional_dir=/data/obcluster/etc3;/redo/obcluster/etc2,max_syslog_file_count=30”

然后执行 set session ob_query_timeout=1000000000; alter system bootstrap ZONE ‘zone1’ SERVER ‘10.110.7.40:2882’, ZONE ‘zone2’ SERVER ‘10.110.7.41:2882’, ZONE ‘zone3’ SERVER ‘10.110.7.42:2882’ ; 依然报错

节点之前是不是有防火墙之类的?

[2024-02-06 10:13:51.837273] INFO easy_socket.c:358 [31343][0][Y0-0000000000000000] [lt=5] [dc=0] Failed to write socket, fd(1934), conn(0x2b54dae77290), errno(111), strerror(Connection refused). No listener on destination IP/PORT, or connect request rejected by firewall/iptables. Use ‘iptalbe -L -n’ or ‘netstat -ntpl’ to check it.

[2024-02-06 10:17:08.487317] INFO [SQL.ENG] ob_alter_system_executor.cpp:1198 [31060][1172][YB420A6E0728-000610AD20DAA5FD] [lt=8] [dc=0] bootstrap timeout(rpc_timeout=999999862)
[2024-02-06 10:17:08.487527] INFO [SERVER] ob_service.cpp:2734 [31061][1174][YB420A6E0728-000610AD20DAA5FD] [lt=29] [dc=0] bootstrap timeout(timeout=600000000, worker_timeout_ts=1707186828487313)
[2024-02-06 10:17:08.487573] WARN [SERVER] check_server_empty (ob_service.cpp:2943) [31061][1174][YB420A6E0728-000610AD20DAA5FD] [lt=12] [dc=0] log dir is not empty
[2024-02-06 10:17:08.487581] WARN [BOOTSTRAP] bootstrap (ob_service.cpp:2758) [31061][1174][YB420A6E0728-000610AD20DAA5FD] [lt=7] [dc=0] observer is not empty(ret=-4015)
[2024-02-06 10:17:08.487693] WARN log_user_error_and_warn (ob_rpc_proxy.cpp:300) [31060][1172][YB420A6E0728-000610AD20DAA5FD] [lt=13] [dc=0]
[2024-02-06 10:17:08.487708] WARN [SQL.ENG] execute (ob_alter_system_executor.cpp:1206) [31060][1172][YB420A6E0728-000610AD20DAA5FD] [lt=8] [dc=0] rpc proxy bootstrap failed(ret=-4015, rpc_timeout=999999862)

bootstrap操作只能执行一次,如果执行失败,需要铲掉环境重新部署

[root@xsky-node2 ~]# iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy DROP)
target prot opt source destination
DOCKER-USER all – 0.0.0.0/0 0.0.0.0/0
DOCKER-ISOLATION-STAGE-1 all – 0.0.0.0/0 0.0.0.0/0
ACCEPT all – 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
DOCKER all – 0.0.0.0/0 0.0.0.0/0
ACCEPT all – 0.0.0.0/0 0.0.0.0/0
ACCEPT all – 0.0.0.0/0 0.0.0.0/0
ACCEPT all – 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
DOCKER all – 0.0.0.0/0 0.0.0.0/0
ACCEPT all – 0.0.0.0/0 0.0.0.0/0
ACCEPT all – 0.0.0.0/0 0.0.0.0/0

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain DOCKER (2 references)
target prot opt source destination

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target prot opt source destination
DOCKER-ISOLATION-STAGE-2 all – 0.0.0.0/0 0.0.0.0/0
DOCKER-ISOLATION-STAGE-2 all – 0.0.0.0/0 0.0.0.0/0
RETURN all – 0.0.0.0/0 0.0.0.0/0

Chain DOCKER-ISOLATION-STAGE-2 (2 references)
target prot opt source destination
DROP all – 0.0.0.0/0 0.0.0.0/0
DROP all – 0.0.0.0/0 0.0.0.0/0
RETURN all – 0.0.0.0/0 0.0.0.0/0

Chain DOCKER-USER (1 references)
target prot opt source destination
RETURN all – 0.0.0.0/0 0.0.0.0/0
[root@xsky-node2 ~]#
[root@xsky-node2 ~]# netstat -ntpl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 3253/mysqld
tcp 0 0 10.110.7.40:8300 0.0.0.0:* LISTEN 1519/consul
tcp 0 0 10.110.7.40:8301 0.0.0.0:* LISTEN 1519/consul
tcp 0 0 0.0.0.0:5902 0.0.0.0:* LISTEN 2104/Xvnc
tcp 0 0 10.110.7.40:8302 0.0.0.0:* LISTEN 1519/consul
tcp 0 0 0.0.0.0:9998 0.0.0.0:* LISTEN 1395/./SFTMonitor
tcp 0 0 0.0.0.0:9999 0.0.0.0:* LISTEN 1395/./SFTMonitor
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1012/rpcbind
tcp 0 0 0.0.0.0:16688 0.0.0.0:* LISTEN 1396/./SFTServer
tcp 0 0 0.0.0.0:16689 0.0.0.0:* LISTEN 1396/./SFTServer
tcp 0 0 0.0.0.0:6002 0.0.0.0:* LISTEN 2104/Xvnc
tcp 0 0 127.0.0.1:8500 0.0.0.0:* LISTEN 1519/consul
tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 1519/consul
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1461/sshd
tcp 0 0 0.0.0.0:2881 0.0.0.0:* LISTEN 30407/bin/observer
tcp 0 0 0.0.0.0:2882 0.0.0.0:* LISTEN 30407/bin/observer
tcp 0 0 0.0.0.0:10050 0.0.0.0:* LISTEN 2260/zabbix_agentd
tcp6 0 0 :::5902 :::* LISTEN 2104/Xvnc
tcp6 0 0 :::111 :::* LISTEN 1012/rpcbind
tcp6 0 0 :::6002 :::* LISTEN 2104/Xvnc
tcp6 0 0 :::22 :::* LISTEN 1461/sshd
tcp6 0 0 :::18882 :::* LISTEN 2121/rea-agentd
tcp6 0 0 :::18883 :::* LISTEN 2142/rea-monitor

你指的重新部署是重新安装rpm包还是按照 kill -9 pidof observer,然后再执行rm -rf ~/oceanbase/store/obdemo// 之后重新执行 cd ~/oceanbase && bin/observer -i 那些命令?

是的,创建的目录结构都需要清空,或者把手动创建的目录都直接删除并重建,然后重新启动,接着执行其他步骤。

我将进程都停了,/data /redo下的目录还有日志文件和storge下的文件都删了,重新来了一次,这次竟然成功了。


非常感谢提醒。