手工安装OB 社区版 V3.1.5_CE_HF2 版本初始化报错

尚雷 · 2024 年2 月 5 日 19:42

【使用环境】测试环境
【 OB 】
无法对集群进行初始化。

【使用版本】
V3.1.5_CE_HF2
【问题描述】清晰明确描述问题
为了参加OBCP V3考试，我在测试环境手工部署了集群。

我在三台服务器上分别通过如下方式启动了服务：
cd ~/oceanbase && bin/observer -i p5p1 -p 2881 -P 2882 -z zone1 -d ~/oceanbase/store/obcluster -r ‘10.110.7.40:2882:2881;10.110.7.41:2882:2881;10.110.7.42:2882:2881’ -c 20240205 -n obcluster -o “memory_limit=8G,cache_wash_threshold=1G,__min_full_resource_pool_memory=268435456,system_memory=3G,memory_chunk_cache_size=128M,cpu_count=12,net_thread_count=4,datafile_size=50G,stack_size=1536K,config_additional_dir=/data/obcluster/etc3;/redo/obcluster/etc2”

cd ~/oceanbase && bin/observer -i p5p1 -p 2881 -P 2882 -z zone2 -d ~/oceanbase/store/obcluster -r ‘10.110.7.40:2882:2881;10.110.7.41:2882:2881;10.110.7.42:2882:2881’ -c 20240205 -n obcluster -o “memory_limit=8G,cache_wash_threshold=1G,__min_full_resource_pool_memory=268435456,system_memory=3G,memory_chunk_cache_size=128M,cpu_count=12,net_thread_count=4,datafile_size=50G,stack_size=1536K,config_additional_dir=/data/obcluster/etc3;/redo/obcluster/etc2”

cd ~/oceanbase && bin/observer -i p5p1 -p 2881 -P 2882 -z zone3 -d ~/oceanbase/store/obcluster -r ‘10.110.7.40:2882:2881;10.110.7.41:2882:2881;10.110.7.42:2882:2881’ -c 20240205 -n obcluster -o “memory_limit=8G,cache_wash_threshold=1G,__min_full_resource_pool_memory=268435456,system_memory=3G,memory_chunk_cache_size=128M,cpu_count=12,net_thread_count=4,datafile_size=50G,stack_size=1536K,config_additional_dir=/data/obcluster/etc3;/redo/obcluster/etc2”

使用mysql -h 10.110.7.41 -u root -P 2881 -p -c -A 登录数据库进行初始化：
报错，报错信息如下：
[admin@xsky-node2 ~]$ mysql -h 10.110.7.40 -u root -P 2881 -p -c -A
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 3221225473
Server version: 5.7.25 OceanBase 3.1.5 (r100020022023091114-8a9dc4b356d043b494015503d6d91f876486fbed) (Built Sep 11 2023 14:38:53)

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

MySQL [(none)] > set session ob_query_timeout=1000000000;alter system bootstrap ZONE ‘zone1’ SERVER ‘10.110.7.40:2882’,ZONE ‘zone2’ SERVER ‘10.110.7.41:2882’, ZONE ‘zone3’ SERVER ‘10.110.7.42:2882’;
Query OK, 0 rows affected (0.00 sec)

ERROR 4015 (HY000): System error
不知道什么原因，该如何处理。

另外就是我对应V3.1.5_CE_HF2 的OB，我的obproxy该安装哪个版本更好。
如有问题请联系： 18701685580
【复现路径】问题出现前后相关操作
【附件及日志】

【SOP系列 22 】——故障诊断第一步(系统巡检和诊断信息收集)

王利博 · 2024 年2 月 5 日 19:44

可以参考下这个帖子 [手动部署ob集群，初始化时报错ERROR 4015 (HY000): System error - 社区问答- OceanBase社区-分布式数据库

尚雷 · 2024 年2 月 6 日 09:39

您好：
我看了你说的哪个帖子，也按照你说的调整了下参数但依然报错。

一、先看下我服务器配置
1）内存
[admin@xsky-node2 oceanbase]$ free -g
total used free shared buff/cache available
Mem: 62 3 55 3 3 50
Swap: 63 0 63

[root@xsky-node2 tmp]# cat /proc/cpuinfo| grep “processor”| wc -l
16

我在启动时，调整了 cpu_count参数到 16
[admin@xsky-node2 oceanbase]$ ps -ef|grep observer
admin 6275 1 98 09:31 ? 00:01:05 bin/observer -i p5p1 -p 2881 -P 2882 -z zone1 -d /home/admin/oceanbase/store/obcluster -r 10.110.7.40:2882:2881;10.110.7.41:2882:2881;10.110.7.42:2882:2881 -c 20240205 -n obcluster -o memory_limit=8G,cache_wash_threshold=1G,__min_full_resource_pool_memory=268435456,system_memory=3G,memory_chunk_cache_size=128M,cpu_count=16,net_thread_count=4,datafile_size=30G,stack_size=1536K,config_additional_dir=/data/obcluster/etc3;/redo/obcluster/etc2

我在三台服务器分别启动了observer服务
然后执行修改 alter system bootstrap ZONE ‘zone1’ SERVER ‘10.110.7.40:2882’,ZONE ‘zone2’ SERVER ‘10.110.7.41:2882’,ZONE ‘zone3’ SERVER ‘10.110.7.42:2882’ ; 依然报错

另外我这台服务器没有启动docker

序风 · 2024 年2 月 6 日 09:41

可以直接看看observer.log的日志，报错是什么

尚雷 · 2024 年2 月 6 日 10:24

我这次重新将进程都杀了，把目录删了重建的文件，执行了
cd ~/oceanbase && bin/observer -i p5p1 -p 2881 -P 2882 -z zone3 -d /home/admin/oceanbase/store/obcluster -r ‘10.110.7.40:2882:2881;10.110.7.41:2882:2881;10.110.7.42:2882:2881’ -c 20240206 -n obcluster -o “memory_limit=30G,cache_wash_threshold=3G,__min_full_resource_pool_memory=268435456,system_memory=6G,memory_chunk_cache_size=128M,cpu_count=12,net_thread_count=4,datafile_size=30G,stack_size=1536K,config_additional_dir=/data/obcluster/etc3;/redo/obcluster/etc2,max_syslog_file_count=30”

然后执行 set session ob_query_timeout=1000000000; alter system bootstrap ZONE ‘zone1’ SERVER ‘10.110.7.40:2882’, ZONE ‘zone2’ SERVER ‘10.110.7.41:2882’, ZONE ‘zone3’ SERVER ‘10.110.7.42:2882’ ; 依然报错

序风 · 2024 年2 月 6 日 11:05

节点之前是不是有防火墙之类的？

[2024-02-06 10:13:51.837273] INFO easy_socket.c:358 [31343][0][Y0-0000000000000000] [lt=5] [dc=0] Failed to write socket, fd(1934), conn(0x2b54dae77290), errno(111), strerror(Connection refused). No listener on destination IP/PORT, or connect request rejected by firewall/iptables. Use ‘iptalbe -L -n’ or ‘netstat -ntpl’ to check it.

尚雷 · 2024 年2 月 6 日 11:12

和顺 · 2024 年2 月 6 日 11:14

[2024-02-06 10:17:08.487317] INFO [SQL.ENG] ob_alter_system_executor.cpp:1198 [31060][1172][YB420A6E0728-000610AD20DAA5FD] [lt=8] [dc=0] bootstrap timeout(rpc_timeout=999999862)
[2024-02-06 10:17:08.487527] INFO [SERVER] ob_service.cpp:2734 [31061][1174][YB420A6E0728-000610AD20DAA5FD] [lt=29] [dc=0] bootstrap timeout(timeout=600000000, worker_timeout_ts=1707186828487313)
[2024-02-06 10:17:08.487573] WARN [SERVER] check_server_empty (ob_service.cpp:2943) [31061][1174][YB420A6E0728-000610AD20DAA5FD] [lt=12] [dc=0] log dir is not empty
[2024-02-06 10:17:08.487581] WARN [BOOTSTRAP] bootstrap (ob_service.cpp:2758) [31061][1174][YB420A6E0728-000610AD20DAA5FD] [lt=7] [dc=0] observer is not empty(ret=-4015)
[2024-02-06 10:17:08.487693] WARN log_user_error_and_warn (ob_rpc_proxy.cpp:300) [31060][1172][YB420A6E0728-000610AD20DAA5FD] [lt=13] [dc=0]
[2024-02-06 10:17:08.487708] WARN [SQL.ENG] execute (ob_alter_system_executor.cpp:1206) [31060][1172][YB420A6E0728-000610AD20DAA5FD] [lt=8] [dc=0] rpc proxy bootstrap failed(ret=-4015, rpc_timeout=999999862)

bootstrap操作只能执行一次，如果执行失败，需要铲掉环境重新部署

尚雷 · 2024 年2 月 6 日 11:15

[root@xsky-node2 ~]# iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy DROP)
target prot opt source destination
DOCKER-USER all – 0.0.0.0/0 0.0.0.0/0
DOCKER-ISOLATION-STAGE-1 all – 0.0.0.0/0 0.0.0.0/0
ACCEPT all – 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
DOCKER all – 0.0.0.0/0 0.0.0.0/0
ACCEPT all – 0.0.0.0/0 0.0.0.0/0
ACCEPT all – 0.0.0.0/0 0.0.0.0/0
ACCEPT all – 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
DOCKER all – 0.0.0.0/0 0.0.0.0/0
ACCEPT all – 0.0.0.0/0 0.0.0.0/0
ACCEPT all – 0.0.0.0/0 0.0.0.0/0

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain DOCKER (2 references)
target prot opt source destination

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target prot opt source destination
DOCKER-ISOLATION-STAGE-2 all – 0.0.0.0/0 0.0.0.0/0
DOCKER-ISOLATION-STAGE-2 all – 0.0.0.0/0 0.0.0.0/0
RETURN all – 0.0.0.0/0 0.0.0.0/0

Chain DOCKER-ISOLATION-STAGE-2 (2 references)
target prot opt source destination
DROP all – 0.0.0.0/0 0.0.0.0/0
DROP all – 0.0.0.0/0 0.0.0.0/0
RETURN all – 0.0.0.0/0 0.0.0.0/0

Chain DOCKER-USER (1 references)
target prot opt source RETURN all – 0.0.0.0/0 [root@xsky-node2 ~]#
[root@xsky-node2 ~]# netstat -ntpl
Active Internet connections Proto Recv-Q Send-Q Local Address tcp 0 0 0.0.0.0:3306 tcp 0 0 10.110.7.40:8300 tcp 0 0 10.110.7.40:8301 tcp 0 0 0.0.0.0:5902 tcp 0 0 10.110.7.40:8302 tcp 0 0 0.0.0.0:9998 tcp 0 0 0.0.0.0:9999 tcp 0 0 0.0.0.0:111 tcp 0 0 0.0.0.0:16688 tcp 0 0 0.0.0.0:16689 tcp 0 0 0.0.0.0:6002 tcp 0 0 127.0.0.1:8500 tcp 0 0 127.0.0.1:53 tcp 0 0 0.0.0.0:22 tcp 0 0 0.0.0.0:2881 tcp 0 0 0.0.0.0:2882 tcp 0 0 0.0.0.0:10050 tcp6 0 0 :::5902 tcp6 0 0 :::111 tcp6 0 0 :::6002 tcp6 0 0 :::22 tcp6 0 0 :::18882 tcp6 0 0 :::18883 destination
0.0.0.0/0
(only servers)
Foreign Address State PID/Program name
0.0.0.0:* LISTEN 3253/mysqld
0.0.0.0:* LISTEN 1519/consul
0.0.0.0:* LISTEN 1519/consul
0.0.0.0:* LISTEN 2104/Xvnc
0.0.0.0:* LISTEN 1519/consul
0.0.0.0:* LISTEN 1395/./SFTMonitor
0.0.0.0:* LISTEN 1395/./SFTMonitor
0.0.0.0:* LISTEN 1012/rpcbind
0.0.0.0:* LISTEN 1396/./SFTServer
0.0.0.0:* LISTEN 1396/./SFTServer
0.0.0.0:* LISTEN 2104/Xvnc
0.0.0.0:* LISTEN 1519/consul
0.0.0.0:* LISTEN 1519/consul
0.0.0.0:* LISTEN 1461/sshd
0.0.0.0:* LISTEN 30407/bin/observer
0.0.0.0:* LISTEN 30407/bin/observer
0.0.0.0:* LISTEN 2260/zabbix_agentd
:::* LISTEN 2104/Xvnc
:::* LISTEN 1012/rpcbind
:::* LISTEN 2104/Xvnc
:::* LISTEN 1461/sshd
:::* LISTEN 2121/rea-agentd
:::* LISTEN 2142/rea-monitor