RPM 安装OceanBase 服务无法启动

【 使用环境 】测试环境
【 OB or 其他组件 】OB
【 使用版本 】 oceanbase-ce-4.3.2.1
【问题描述】
本地使用一台centos7.9 x64 主机 ,cpu:8 内存:8G 安装社区版Oceanbase
通过yum 方式安装ob,安装成功后服务无法启动

[root@ocenbase ~]# systemctl status oceanbase
● oceanbase.service - oceanbase
Loaded: loaded (/etc/systemd/system/oceanbase.service; disabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since 六 2024-10-12 16:33:36 CST; 1s ago
Process: 32525 ExecStart=/bin/bash /home/admin/oceanbase/profile/oceanbase-service.sh start (code=exited, status=1/FAILURE)
CGroup: /system.slice/oceanbase.service
├─2279 /home/admin/oceanbase/bin/obshell daemon --ip 0.0.0.0 --port 2886
└─2306 /home/admin/oceanbase/bin/obshell server --ip 0.0.0.0 --port 2886

10月 12 16:33:36 ocenbase systemd[1]: Failed to start oceanbase.
10月 12 16:33:36 ocenbase systemd[1]: Unit oceanbase.service entered failed state.
10月 12 16:33:36 ocenbase systemd[1]: oceanbase.service failed.

[root@ocenbase profile]# /bin/bash /home/admin/oceanbase/profile/oceanbase-service.sh start
oceanbase service started at 2024-10-12 16:35:44
change etc owner to root
daemon process with PID 2279 is running.
The agent service is exist
The observer has been installed before
observer PID file not found.

什么操作系统?
直接下载的OB RPM包,然后执行rpm -i,还是用的yum install?

重新安装了一下好了

操作系统centos7.9
使用的是yum install oceanbase-ce oceanbase-ce-libs obclient
刚安装好之后systemctl start oceanbase 可以启动成功

重启服务器之后,启动不了
10月 14 16:31:31 ocenbase systemd[1]: Starting oceanbase…
10月 14 16:31:31 ocenbase bash[5056]: oceanbase service started at 2024-10-14 16:31:31
10月 14 16:31:33 ocenbase bash[5056]: change etc owner to root
10月 14 16:31:33 ocenbase bash[5056]: daemon process with PID 2180 is running.
10月 14 16:31:33 ocenbase bash[5056]: The agent service is exist
10月 14 16:31:33 ocenbase bash[5056]: The observer is already bootstrap, please start it immediately
10月 14 16:31:33 ocenbase bash[5056]: the start observer trace id is 201684326770288613
10月 14 16:31:33 ocenbase bash[5056]: the response state is READY
10月 14 16:31:33 ocenbase bash[5056]: wait 6s and the retry
10月 14 16:31:39 ocenbase bash[5056]: the response state is FAILED
10月 14 16:31:39 ocenbase bash[5056]: start observer request failed
10月 14 16:31:39 ocenbase systemd[1]: oceanbase.service: control process exited, code=exited status=1
10月 14 16:31:39 ocenbase systemd[1]: Failed to start oceanbase.
10月 14 16:31:39 ocenbase systemd[1]: Unit oceanbase.service entered failed state.
10月 14 16:31:39 ocenbase systemd[1]: oceanbase.service failed.
10月 14 16:31:49 ocenbase systemd[1]: oceanbase.service holdoff time over, scheduling restart.

[root@ocenbase log]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 931.5G 0 disk
├─sda1 8:1 0 512M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 930G 0 part
├─centos-root 253:0 0 922G 0 lvm /
└─centos-swap 253:1 0 8G 0 lvm [SWAP]

[root@ocenbase log]# cat /etc/oceanbase.cnf

The IP can only be configured during the initial startup.

By default, the IP is set to the output of hostname -i unless specified otherwise.

To modify the observer IP after startup, you must uninstall OceanBase and remove the data directory.

Afterward, you can reinstall and launch OceanBase with the new IP address.

For instance, to set the observer IP to 127.0.0.1, you would use the line:

ip=127.0.0.1

mysql_port=2881
rpc_port=2882
obshell_port=2886
root_pwd=""
redo_dir=/var/lib/oceanbase/redo
data_dir=/var/lib/oceanbase/data
datafile_size=2G
datafile_next=2G
datafile_maxsize=20G
cpu_count=16
memory_limit=6G
system_memory=1G
log_disk_size=13G
enable_syslog_wf=false
enable_syslog_recycle=true
max_syslog_file_count=4
__min_full_resource_pool_memory=1073741824

大概率是observer启动失败,比如core掉了


从代码看这里已经开始进入启动observer二进制的阶段了

明显第一次已经是失败,observer退出了,后面再拉起,也是二进制起来一会,就退掉了

可以发出来observer的日志,先简单分析一下

重新启动集群报错后,发一份observer日志这边帮忙分析下

observer 没有记录到日志

我是root用户启动的服务, 这个必须要用admin 用户启动服务吗

admin安装的集群就用admin

我使用root 安装的

现在难受的是observer.log没有记录到日志,搞不懂了。

/home/admin/oceanbase/log 目录下有日志吗

/home/admin/oceanbase/log_obshell 里面的日志也可以发下看看

2024-10-15T08:45:13.908 INFO [11261] [F000000000000000] [executor/executor.go:283] try to execute task 17044
2024-10-15T08:45:13.908 INFO [11261] [F000000000000000] [executor/executor.go:318] execute task 17044, execute
2024-10-15T08:45:13.908 INFO [11261] [F000000000000000] [task/task.go:303] task 17044 Inform all agents to start observer execute log: Inform 10.10.20.37:2886 to create the task
2024-10-15T08:45:13.908 INFO [11261] [F000000000000000] [ob/sub_dag.go:95] main dag id is 20168432677028862843
2024-10-15T08:45:13.908 INFO [11261] [F000000000000000] [ob/stop_obsvr.go:138] check if in maintenance
2024-10-15T08:45:13.908 ERROR [11261] [F000000000000000] [ob/stop_obsvr.go:146] get last maintenance dag id failed fields:, error=“param mainDagId not set”
2024-10-15T08:45:13.908 ERROR [11261] [F000000000000000] [task/task.go:301] task 17044 Inform all agents to start observer execute log: ERROR: OcsAgentError: code = 1010, message = Known error: agent is under maintenance
2024-10-15T08:45:13.913 INFO [11261] [F000000000000000] [executor/executor.go:263] finishing local task 17044
2024-10-15T08:45:13.928 INFO [11261] [F000000000000000] [executor/executor.go:278] finish task 17044 end
2024-10-15T08:45:14.262 INFO [11261] [F000000000000000] [oceanbase/builder.go:71] try connect oceanbase: 1
2024-10-15T08:45:14.262 INFO [11261] [F000000000000000] [oceanbase/loader.go:163] open oceanbase failed fields:, error=“dial tcp 127.0.0.1:2881: connect: connection refused”
2024-10-15T08:45:14.262 INFO [11261] [F000000000000000] [oceanbase/loader.go:79] loadOceanbaseInstanceWithoutDBNameUntilSucc last error is initialize oceanbase failed: observer process not exist
2024-10-15T08:45:14.262 INFO [11261] [F000000000000000] [oceanbase/loader.go:85] init oceanbase instance without db name failed
2024-10-15T08:45:14.976 INFO [11261] [LS00000000000000] [scheduler/dag_handler.go:35] advance dag 2843
2024-10-15T08:45:14.976 INFO [11261] [LS00000000000000] [scheduler/node_handler.go:25] advance node 17044
2024-10-15T08:45:14.976 INFO [11261] [LS00000000000000] [scheduler/task_handler.go:39] advanceTask: node 17044 operator 1
2024-10-15T08:45:14.976 INFO [11261] [LS00000000000000] [scheduler/task_handler.go:79] sub task 17044 state 5
2024-10-15T08:45:14.976 INFO [11261] [LS00000000000000] [scheduler/task_handler.go:52] ready Task num 0, isFinished true, isSucceed true
2024-10-15T08:45:15.263 INFO [11261] [F000000000000000] [oceanbase/builder.go:71] try connect oceanbase: 1
2024-10-15T08:45:15.263 INFO [11261] [F000000000000000] [oceanbase/loader.go:163] open oceanbase failed fields:, error=“dial tcp 127.0.0.1:2881: connect: connection refused”
2024-10-15T08:45:15.263 INFO [11261] [F000000000000000] [oceanbase/loader.go:79] loadOceanbaseInstanceWithoutDBNameUntilSucc last error is initialize oceanbase failed: observer process not exist
2024-10-15T08:45:15.263 INFO [11261] [F000000000000000] [oceanbase/loader.go:85] init oceanbase instance without db name failed
2024-10-15T08:45:16.054 INFO [11261] [LS00000000000000] [scheduler/dag_handler.go:35] advance dag 2843
2024-10-15T08:45:16.054 INFO [11261] [LS00000000000000] [scheduler/node_handler.go:25] advance node 17045
2024-10-15T08:45:16.062 INFO [11261] [LS00000000000000] [scheduler/task_handler.go:39] advanceTask: node 17045 operator 1
2024-10-15T08:45:16.062 INFO [11261] [LS00000000000000] [scheduler/task_handler.go:79] sub task 17045 state 1
2024-10-15T08:45:16.066 INFO [11261] [LS00000000000000] [scheduler/task_handler.go:52] ready Task num 1, isFinished false, isSucceed true
2024-10-15T08:45:16.066 INFO [11261] [LS00000000000000] [scheduler/task_handler.go:55] ready sub task 17045 operator 1
2024-10-15T08:45:16.066 INFO [11261] [F000000000000000] [executor/pool.go:58] add task 17045 to ExecutorPool
2024-10-15T08:45:16.066 INFO [11261] [F000000000000000] [executor/executor.go:129] try to start task 17045
2024-10-15T08:45:16.070 INFO [11261] [F000000000000000] [executor/executor.go:135] start to task 17045 execute
2024-10-15T08:45:16.070 INFO [11261] [F000000000000000] [executor/executor.go:283] try to execute task 17045
2024-10-15T08:45:16.070 INFO [11261] [F000000000000000] [executor/executor.go:318] execute task 17045, execute
2024-10-15T08:45:16.070 ERROR [11261] [F000000000000000] [task/task.go:301] task 17045 Make sure all agents are ready execute log: ERROR: check sub dag created failed: Not all tasks created. main dag failed
2024-10-15T08:45:16.070 WARN [11261] [F000000000000000] [executor/executor.go:137] task 17045 execute error fields: error=“check sub dag created failed: Not all tasks created. main dag failed”
2024-10-15T08:45:16.070 INFO [11261] [F000000000000000] [executor/executor.go:263] finishing local task 17045
2024-10-15T08:45:16.074 INFO [11261] [F000000000000000] [executor/executor.go:278] finish task 17045 end
2024-10-15T08:45:16.264 INFO [11261] [F000000000000000] [oceanbase/builder.go:71] try connect oceanbase: 1
2024-10-15T08:45:16.265 INFO [11261] [F000000000000000] [oceanbase/loader.go:163] open oceanbase failed fields: error=“dial tcp 127.0.0.1:2881: connect: connection refused”
2024-10-15T08:45:16.265 INFO [11261] [F000000000000000] [oceanbase/loader.go:79] loadOceanbaseInstanceWithoutDBNameUntilSucc last error is initialize oceanbase failed: observer process not exist
2024-10-15T08:45:16.265 INFO [11261] [F000000000000000] [oceanbase/loader.go:85] init oceanbase instance without db name failed
2024-10-15T08:45:17.145 INFO [11261] [LS00000000000000] [scheduler/dag_handler.go:35] advance dag 2843
2024-10-15T08:45:17.145 INFO [11261] [LS00000000000000] [scheduler/node_handler.go:25] advance node 17045
2024-10-15T08:45:17.145 INFO [11261] [LS00000000000000] [scheduler/task_handler.go:39] advanceTask: node 17045 operator 1
2024-10-15T08:45:17.145 INFO [11261] [LS00000000000000] [scheduler/task_handler.go:79] sub task 17045 state 4
2024-10-15T08:45:17.145 INFO [11261] [LS00000000000000] [scheduler/task_handler.go:52] ready Task num 0, isFinished true, isSucceed false
2024-10-15T08:45:17.265 INFO [11261] [F000000000000000] [oceanbase/builder.go:71] try connect oceanbase: 1
2024-10-15T08:45:17.266 INFO [11261] [F000000000000000] [oceanbase/loader.go:163] open oceanbase failed fields:, error=“dial tcp 127.0.0.1:2881: connect: connection refused”
2024-10-15T08:45:17.266 INFO [11261] [F000000000000000] [oceanbase/loader.go:79] loadOceanbaseInstanceWithoutDBNameUntilSucc last error is initialize oceanbase failed: observer process not exist
2024-10-15T08:45:17.266 INFO [11261] [F000000000000000] [oceanbase/loader.go:85] init oceanbase instance without db name failed
2024-10-15T08:45:18.267 INFO [11261] [F000000000000000] [oceanbase/builder.go:71] try connect oceanbase: 1
2024-10-15T08:45:18.267 INFO [11261] [F000000000000000] [oceanbase/loader.go:163] open oceanbase failed fields:, error=“dial tcp 127.0.0.1:2881: connect: connection refused”
2024-10-15T08:45:18.267 INFO [11261] [F000000000000000] [oceanbase/loader.go:79] loadOceanbaseInstanceWithoutDBNameUntilSucc last error is initialize oceanbase failed: observer process not exist
2024-10-15T08:45:18.267 INFO [11261] [F000000000000000] [oceanbase/loader.go:85] init oceanbase instance without db name failed
2024-10-15T08:45:19.053 INFO [11261] [c2150814f097d546] [common/middleware.go:168] API request: [GET /api/v1/task/dag/20168432677028862843, client=, traceId=c2150814f097d546]
2024-10-15T08:45:19.055 INFO [11261] [c2150814f097d546] [common/middleware.go:242] API response OK: [GET /api/v1/task/dag/20168432677028862843, client=, traceId=c2150814f097d546, duration=2, status=200, data=&{GenericDTO:0xc000ac53d0 DagDetail:0xc000a90be0}]
2024-10-15T08:45:19.290 INFO [11261] [F000000000000000] [oceanbase/builder.go:71] try connect oceanbase: 1

/home/admin/oceanbase/log 这里面不会记录新日志

使用 systemd 部署 OceanBase 数据库-V4.3.3-OceanBase 数据库文档-分布式数据库使用文档

按照这个安装教程可以复现问题;安装成功后,可以正常启动oceanbase服务。
然后其他什么都不用动,重启服务器;系统启动后,oceanbase 服务就无法启动了。

2024-10-15T08:45:13.908 INFO [11261] [F000000000000000] [task/task.go:303] task 17044 Inform all agents to start observer execute log: Inform 10.10.20.37:2886 to create the task。
选用的127.0.0.1,这一步突然采用了网卡ip。是否之前安装集群过然后未删除干净。

删除干净了,所有目录都清理掉,我看脚本里面hostname 命令就是能获取到网卡IP