欧拉openEuler24.03 (LTS)OceanBase无法启动

【 使用环境 】测试环境(将转生产环境)
虚拟机8CPU、16G内存、100G硬盘
【 OB or 其他组件 】
1)版本:oceanbase-ce-4.2.1.4
2)安装方式:
yum install hostname findutils -y
yum install oceanbase-ce -y
Downloading Packages:
(1/4): oniguruma-6.9.9-1.oe2403.x86_64.rpm 593 kB/s | 164 kB 00:00
(2/4): jq-1.6-4.oe2403.x86_64.rpm 569 kB/s | 165 kB 00:00
(3/4): oceanbase-ce-libs-4.2.1.4-104000052024022918.oe2403.x86_64.rpm 1.0 MB/s | 158 kB 00:00
(4/4): oceanbase-ce-4.2.1.4-104000052024022918.oe2403.x86_64.rpm

3)安装路径
/home/admin/oceanbase
4)数据路劲
/var/lib/oceanbase
5)硬盘使用情况
[root@localhost oceanbase]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/openeuler-root 60G 20G 38G 35% /
devtmpfs 4.0M 0 4.0M 0% /dev
tmpfs 7.6G 0 7.6G 0% /dev/shm
tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup
tmpfs 3.1G 9.0M 3.1G 1% /run
tmpfs 7.6G 0 7.6G 0% /tmp
/dev/sda2 974M 175M 732M 20% /boot
/dev/mapper/openeuler-home 30G 2.0G 26G 8% /home
6)数据库目录大小
[root@localhost oceanbase]# du -sh *
2.1G data
14G redo

【 使用版本 】oceanbase-ce-4.2.1.4
【问题描述】清晰明确描述问题
系统启动后,通过systemctl status oceanbase启动
报如下信息
[root@localhost oceanbase]# systemctl start oceanbase
Job for oceanbase.service failed because the control process exited with error code.
See “systemctl status oceanbase.service” and “journalctl -xeu oceanbase.service” for details.
[root@localhost oceanbase]# journalctl -xeu oceanbase.service
░░
░░ The job identifier is 1273 and the job result is failed.
Oct 25 16:08:17 localhost.localdomain systemd[1]: oceanbase.service: Scheduled restart job, restart counter is at 1.
░░ Subject: Automatic restarting of a unit has been scheduled
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ Automatic restarting of the unit oceanbase.service has been scheduled, as the result for
░░ the configured Restart= setting for the unit.
Oct 25 16:08:17 localhost.localdomain systemd[1]: oceanbase.service: Found left-over process 3145 (obshell) in control group while starting unit. Ignoring.
Oct 25 16:08:17 localhost.localdomain systemd[1]: oceanbase.service: This usually indicates unclean termination of a previous run, or service implementatio>
Oct 25 16:08:17 localhost.localdomain systemd[1]: oceanbase.service: Found left-over process 3172 (obshell) in control group while starting unit. Ignoring.
Oct 25 16:08:17 localhost.localdomain systemd[1]: oceanbase.service: This usually indicates unclean termination of a previous run, or service implementatio>
Oct 25 16:08:17 localhost.localdomain systemd[1]: Starting oceanbase…
░░ Subject: A start job for unit oceanbase.service has begun execution
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A start job for unit oceanbase.service has begun execution.
░░
░░ The job identifier is 1368.
Oct 25 16:08:17 localhost.localdomain bash[3220]: oceanbase service started at 2024-10-25 16:08:17
Oct 25 16:08:18 localhost.localdomain bash[3220]: change etc owner to root
Oct 25 16:08:18 localhost.localdomain bash[3220]: daemon process with PID 3145 is running.
Oct 25 16:08:18 localhost.localdomain bash[3220]: The agent service is exist
Oct 25 16:08:18 localhost.localdomain bash[3220]: The observer is already bootstrap, please start it immediately
Oct 25 16:08:18 localhost.localdomain bash[3220]: the start observer trace id is 22886731484028867856
Oct 25 16:08:18 localhost.localdomain bash[3220]: the response state is READY
Oct 25 16:08:18 localhost.localdomain bash[3220]: wait 6s and the retry
进程
root@localhost oceanbase]# ps -aux|grep obs
root 3145 0.0 0.1 1340460 24416 ? Sl 16:07 0:00 /home/admin/oceanbase/bin/obshell daemon --ip 172.16.6.220 --port 2886
root 3172 0.4 0.2 1274116 32424 ? Sl 16:07 0:00 /home/admin/oceanbase/bin/obshell server --ip 172.16.6.220 --port 2886
root 4856 0.0 0.0 21964 2172 pts/0 S+ 16:08 0:00 grep --color=auto obs
状态(感觉一直在反复重启)
[root@localhost oceanbase]# systemctl status oceanbase
● oceanbase.service - oceanbase
Loaded: loaded (/etc/systemd/system/oceanbase.service; disabled; preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Fri 2024-10-25 16:12:08 CST; 8s ago
Process: 10323 ExecStart=/bin/bash /home/admin/oceanbase/profile/oceanbase-service.sh start (code=exited, status=1/FAILURE)
Tasks: 19 (limit: 98657)
Memory: 149.8M ()
CGroup: /system.slice/oceanbase.service
├─3145 /home/admin/oceanbase/bin/obshell daemon --ip 172.16.6.220 --port 2886
└─3172 /home/admin/oceanbase/bin/obshell server --ip 172.16.6.220 --port 2886
[root@localhost oceanbase]# systemctl status oceanbase
● oceanbase.service - oceanbase
Loaded: loaded (/etc/systemd/system/oceanbase.service; disabled; preset: disabled)
Active: activating (start) since Fri 2024-10-25 16:12:08 CST; 11s ago
Cntrl PID: 10871 (bash)
Tasks: 21 (limit: 98657)
Memory: 151.0M ()
CGroup: /system.slice/oceanbase.service
├─ 3145 /home/admin/oceanbase/bin/obshell daemon --ip 172.16.6.220 --port 2886
├─ 3172 /home/admin/oceanbase/bin/obshell server --ip 172.16.6.220 --port 2886
├─10871 /bin/bash /home/admin/oceanbase/profile/oceanbase-service.sh start
└─11411 sleep 6

Oct 25 16:12:18 localhost.localdomain systemd[1]: oceanbase.service: This usually indicates unclean termination of a previous run, or service implementatio>
Oct 25 16:12:18 localhost.localdomain systemd[1]: Starting oceanbase…
Oct 25 16:12:18 localhost.localdomain bash[10871]: oceanbase service started at 2024-10-25 16:12:18
Oct 25 16:12:19 localhost.localdomain bash[10871]: change etc owner to root
Oct 25 16:12:19 localhost.localdomain bash[10871]: daemon process with PID 3145 is running.
Oct 25 16:12:19 localhost.localdomain bash[10871]: The agent service is exist
Oct 25 16:12:19 localhost.localdomain bash[10871]: The observer is already bootstrap, please start it immediately
Oct 25 16:12:19 localhost.localdomain bash[10871]: the start observer trace id is 22886731484028867870
Oct 25 16:12:19 localhost.localdomain bash[10871]: the response state is READY
Oct 25 16:12:19 localhost.localdomain bash[10871]: wait 6s and the retry

【复现路径】问题出现前后相关操作
反复操作systemctl start/stop oceanbase
kill /home/admin/oceanbase/bin/obshell daemon
kill /home/admin/oceanbase/bin/obshell server

【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):

【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

【备注】基于 LLM 和开源文档 RAG 的论坛小助手已开放测试,在发帖时输入 [@论坛小助手] 即可召唤小助手,欢迎试用!

3 个赞

根据您提供的信息,OceanBase 在 openEuler 24.03 (LTS) 上无法启动,具体表现为 systemctl start oceanbase 命令执行失败,并且从 journalctl -xeu oceanbase.service 日志中可以看到存在一些遗留进程(如 obshell)未被正确清理。这通常表明之前的运行中存在问题导致服务未能正常终止。 [2]

解决步骤

  1. 清理遗留进程

    • 首先,确保所有与 OceanBase 相关的进程都被杀死。您可以使用以下命令来查找并杀死这些进程:
      pkill -f obshell
      
    • 确认所有相关进程已被杀死:
      ps -aux | grep obs
      
  2. 检查日志文件

    • 查看 OceanBase 的日志文件,通常位于 /var/log/oceanbase/home/admin/oceanbase/log 目录下,以获取更多详细的错误信息。
      tail -f /var/log/oceanbase/observer.log
      
  3. 重新启动服务

    • 清理完遗留进程后,再次尝试启动 OceanBase 服务:
      systemctl start oceanbase
      
    • 检查服务状态:
      systemctl status oceanbase
      
  4. 检查配置文件

    • 确保配置文件 /home/admin/oceanbase/conf/observer.config.ini 没有错误配置。特别是检查数据目录和日志目录的路径是否正确,并且这些目录的权限是否正确。
      ls -ld /var/lib/oceanbase
      ls -ld /var/log/oceanbase
      
  5. 检查磁盘空间

    • 确保磁盘空间充足,特别是数据目录和日志目录所在的分区。
      df -h
      
  6. 使用 obdiag 工具收集诊断信息

    • 如果上述步骤仍然无法解决问题,建议使用 obdiag 工具收集诊断信息,以便进一步分析。
      obdiag collect all
      

参考文档

  • 文档片段 2:修复了在 UOS 和 openEuler 操作系统上 OBServer 节点无法启动或者 CPU 过高的问题。 [2]

如果您在执行上述步骤后仍然遇到问题,建议联系 OceanBase 的技术支持人员以获取进一步的帮助。希望这些步骤能帮助您解决问题。

详情请参考:

  1. 4000 ~ 4499
  2. OceanBase 数据库社区版 V4.0.0
  3. ob_error
  4. 4500 ~ 4999
  5. OceanBase 数据库社区版 V3.1.4
3 个赞

https://www.oceanbase.com/docs/obd-cn
欧拉2403支持使用obd进行部署 建议你用obd部署试试

3 个赞

由于项目环境不能连互联网,项目环境要通过离线方式安装。
实际上刚安装完成后正常使用了几天,每天也正常开关机,发现
/home/admin/oceanbase/log里面的日志,无论怎么操作observer.log不更新了,基本都停留在10月21日。
[root@localhost log]# ll
total 1413300
-rw-r–r–. 1 root root 53348379 Oct 21 18:39 election.log
-rw-r–r–. 1 root root 0 Oct 17 18:31 election.log.wf
-rw-r–r–. 1 root root 53078747 Oct 21 18:39 observer.log
-rw-r–r–. 1 root root 268435517 Oct 21 15:00 observer.log.20241021150049599
-rw-r–r–. 1 root root 268435593 Oct 21 16:09 observer.log.20241021160928017
-rw-r–r–. 1 root root 268436067 Oct 21 17:17 observer.log.20241021171755781
-rw-r–r–. 1 root root 268435588 Oct 21 18:26 observer.log.20241021182601423
-rw-r–r–. 1 root root 5741 Oct 21 18:26 observer.log.wf
-rw-r–r–. 1 root root 16061 Oct 18 13:33 observer.log.wf.20241018133527686
-rw-r–r–. 1 root root 2621 Oct 18 13:58 observer.log.wf.20241018140120611
-rw-r–r–. 1 root root 6461 Oct 18 15:03 observer.log.wf.20241018150542498
-rw-r–r–. 1 root root 20381 Oct 18 18:25 observer.log.wf.20241018182829236
-rw-r–r–. 1 root root 120597513 Oct 21 18:39 rootservice.log
-rw-r–r–. 1 root root 240 Oct 21 10:36 rootservice.log.wf
-rw-r–r–. 1 root root 146335181 Oct 18 19:06 trace.log

3 个赞

[root@localhost log]# du -sh *
51M election.log
0 election.log.wf
51M observer.log
257M observer.log.20241021150049599
257M observer.log.20241021160928017
257M observer.log.20241021171755781
257M observer.log.20241021182601423
8.0K observer.log.wf
16K observer.log.wf.20241018133527686
4.0K observer.log.wf.20241018140120611
8.0K observer.log.wf.20241018150542498
24K observer.log.wf.20241018182829236
116M rootservice.log
4.0K rootservice.log.wf
140M trace.log

2 个赞

根据您提供的信息,observer.log 文件自 10 月 21 日以来没有更新,这可能表明 OceanBase 服务在该日期之后没有正常运行或记录日志。以下是一些可能的原因和解决步骤:

可能的原因

  1. 服务未启动:OceanBase 服务可能没有正常启动。
  2. 日志配置问题:日志配置可能有问题,导致日志没有被正确记录。
  3. 权限问题:日志文件的权限可能被更改,导致 OceanBase 无法写入日志。
  4. 磁盘空间问题:磁盘空间可能不足,导致无法写入新的日志。
  5. 系统时间问题:系统时间可能不正确,导致日志记录出现问题。

解决步骤

  1. 检查服务状态

    • 确认 OceanBase 服务是否正在运行:
      systemctl status oceanbase
      
    • 如果服务未运行,尝试启动服务:
      systemctl start oceanbase
      
  2. 检查日志配置

    • 确认日志配置文件 /home/admin/oceanbase/conf/observer.config.ini 中的日志路径和配置是否正确。
      log_dir = /home/admin/oceanbase/log
      
  3. 检查日志文件权限

    • 确认日志文件和目录的权限是否正确:
      ls -ld /home/admin/oceanbase/log
      ls -l /home/admin/oceanbase/log
      
    • 如果权限不正确,可以尝试修改权限:
      chown -R admin:admin /home/admin/oceanbase/log
      chmod -R 755 /home/admin/oceanbase/log
      
  4. 检查磁盘空间

    • 确认磁盘空间是否充足:
      df -h
      
  5. 检查系统时间

    • 确认系统时间是否正确:
      date
      
    • 如果系统时间不正确,可以手动设置系统时间或使用 NTP 同步时间:
      timedatectl set-time "YYYY-MM-DD HH:MM:SS"
      timedatectl set-ntp true
      
  6. 查看其他日志文件

    • 查看 rootservice.logelection.log 文件,确认是否有其他错误信息:
      tail -f /home/admin/oceanbase/log/rootservice.log
      tail -f /home/admin/oceanbase/log/election.log
      
  7. 使用 obdiag 工具收集诊断信息

    • 如果上述步骤仍然无法解决问题,建议使用 obdiag 工具收集诊断信息,以便进一步分析:
      obdiag collect all
      

参考文档

  • 文档片段 1:描述了 OceanBase 数据库日志模块的日志文件类型、路径、日志轮转机制和日志归档功能。[1]

如果您在执行上述步骤后仍然遇到问题,建议联系 OceanBase 的技术支持人员以获取进一步的帮助。希望这些步骤能帮助您解决问题。

详情请参考:

  1. 日志概述

(小助手的答复已结束,如未能解决您的问题,请继续提问并等待其他同学的回复,谢谢!)

1 个赞

[root@localhost log]# cat /etc/oceanbase.cnf

The IP can only be configured during the initial startup.

By default, the IP is set to the output of hostname -i unless specified otherwise.

To modify the observer IP after startup, you must uninstall OceanBase and remove the data directory.

Afterward, you can reinstall and launch OceanBase with the new IP address.

For instance, to set the observer IP to 127.0.0.1, you would use the line:

ip=127.0.0.1

mysqlPort=2881
rpcPort=2882
obshellPort=2886
rootPwd=""
redoDir=/var/lib/oceanbase/redo
dataDir=/var/lib/oceanbase/data
datafile_size=2G
datafile_next=2G
datafile_maxsize=20G
cpu_count=16
memory_limit=6G
system_memory=2G
log_disk_size=13G
enable_syslog_wf=false
enable_syslog_recycle=true
max_syslog_file_count=4
__min_full_resource_pool_memory=1073741824

1 个赞

observer日志
/home/admin/oceanbase/bin/log
2024-10-26 14:21:31.171028] INFO [SHARE] load_config (ob_config_manager.cpp:129) [2173][observer][T0][Y0-0000000000000000-0-0] [lt=3] Config file doesn’t exist, read from command line(path=“etc/observer.config.bin”, ret=-4027)
这里提示找不到配置文件,目录不是/etc/oceanbase.cnf??
是/home/admin/oceanbase/bin/etc下的?这里是空的。

还有一段
WDIAG [SHARE] strict_check_special (ob_server_config.cpp:147) [2173][observer][T0][Y0-0000000000000000-0-0] [lt=4][errcode=-4147] invalid cluster id(ret=-4147, cluster_id.str()=“0”)
[2024-10-26 14:21:31.171626] ERROR issue_dba_error (ob_log.cpp:1875) [2173][observer][T0][Y0-0000000000000000-0-0] [lt=3][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4147, file=“ob_server.cpp”, line_no=1887, info=“some config setting is not valid”)
[2024-10-26 14:21:31.171982] EDIAG [SERVER] init_config (ob_server.cpp:1887) [2173][observer][T0][Y0-0000000000000000-0-0] [lt=356][errcode=-4147] some config setting is not valid(ret=-4147, ret=“OB_INVALID_CONFIG”) BACKTRACE:0x11351aed 0x6ae7d82 0x6ae7844 0x6ae74d2 0x6ac6a5b 0x9d18eb6 0x9cf032b 0x9ce5405 0x6ab966f 0x7fb20424b9d0 0x7fb20424ba89 0x4e76385
[2024-10-26 14:21:31.172067] ERROR issue_dba_error (ob_log.cpp:1875) [2173][observer][T0][Y0-0000000000000000-0-0] [lt=81][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4147, file=“ob_server.cpp”, line_no=264, info=“init config failed”)
[2024-10-26 14:21:31.172072] EDIAG [SERVER] init (ob_server.cpp:264) [2173][observer][T0][Y0-0000000000000000-0-0] [lt=5][errcode=-4147] init config failed(ret=-4147, ret=“OB_INVALID_CONFIG”) BACKTRACE:0x11351aed 0x6ae7d82 0x6ae7844 0x6ae74d2 0x6ac6a5b 0x9cf1b3a 0x9ce792d 0x6ab966f 0x7fb20424b9d0 0x7fb20424ba89 0x4e76385

ERROR issue_dba_error (ob_log.cpp:1875) [2173][observer][T0][Y0-0000000000000000-0-0] [lt=3][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4147, file=“ob_server.cpp”, line_no=509, info="[OBSERVER_NOTICE] fail to init observer")
[2024-10-26 14:21:31.186412] EDIAG [SERVER] init (ob_server.cpp:509) [2173][observer][T0][Y0-0000000000000000-0-0] [lt=6][errcode=-4147] [OBSERVER_NOTICE] fail to init observer(ret=-4147, ret=“OB_INVALID_CONFIG”) BACKTRACE:0x11351aed 0x6ae7d82 0x6ae7844 0x6ae74d2 0x6ac6a5b 0x9cf855a 0x9ce9fe3 0x6ab966f 0x7fb20424b9d0 0x7fb20424ba89 0x4e76385
[2024-10-26 14:21:31.186428] ERROR init (ob_server.cpp:510) [2173][observer][T0][Y0-0000000000000000-0-0] [lt=16][errcode=-4393] observer start process failure(msg=“observer init() has failure”, ret=-4147, ret=“OB_INVALID_CONFIG”)
[2024-10-26 14:21:31.186435] ERROR issue_dba_error (ob_log.cpp:1875) [2173][observer][T0][Y0-0000000000000000-0-0] [lt=6][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4147, file=“main.cpp”, line_no=585, info=“observer init fail”)
[2024-10-26 14:21:31.186438] EDIAG [SERVER] main (main.cpp:585) [2173][observer][T0][Y0-0000000000000000-0-0] [lt=3][errcode=-4147] observer init fail(ret=-4147) BACKTRACE:0x11351aed 0x6ac0433 0x4d64834 0x6ac0174 0x6ab8161 0x6aba8c9 0x6ab99bc 0x7fb20424b9d0 0x7fb20424ba89 0x4e76385

1 个赞

observer.log (1.5 MB)

1 个赞

问题已经解决,文件有丢失
/home/admin/oceanbase/etc下的文件全部丢失了,新安装了一台对比各目录,copy过来修复

1 个赞