启动报错ERROR 8001 (08004): Server is initializing

observer.rar (6.7 MB)
rootservice.rar (3.7 MB)

【 使用环境 】 测试环境
【 OB or 其他组件 】OB
【 使用版本 】4.3.0.1
【问题描述】突然收到OB数据库无法登陆,看到observer 连接超时,重启obd cluster stop demo,然后obd cluster start demo,无法启动,提示如下
【复现路径】
Get local repositories and plugins ok
Load cluster param plugin ok
Open ssh connection ok
Cluster status check ok
Search plugins ok
Load cluster param plugin ok
Check before start observer ok
Check before start obproxy ok
Check before start obagent ok
Check before start prometheus ok
Check before start grafana ok
Start observer ok
observer program health check ok
obshell program health check ok
Connect to observer x
[ERROR] OBD-1006: Failed to connect to oceanbase-ce
[ERROR] OBD-1005: Some of the servers in the cluster have been stopped
See OceanBase分布式数据库-海量数据 笔笔算数 .
Trace ID: ff5fc01c-4af6-11ef-9001-0050568c4d06
If you want to view detailed obd logs, please run: obd display-trace ff5fc01c-4af6-11ef-9001-0050568c4d06

具体
[2024-07-26 10:35:26.856] [DEBUG] — connect 127.0.0.1 -P2881 -uroot -pE0QIJt7RE2z0sxvtvyJZ
[2024-07-26 10:35:26.860] [ERROR] Traceback (most recent call last):
[2024-07-26 10:35:26.861] [ERROR] File “core.py”, line 2532, in reload_cluster
[2024-07-26 10:35:26.861] [ERROR] File “core.py”, line 2572, in _reload_cluster
[2024-07-26 10:35:26.861] [ERROR] File “core.py”, line 2158, in _start_cluster
[2024-07-26 10:35:26.861] [ERROR] File “core.py”, line 197, in call_plugin
[2024-07-26 10:35:26.861] [ERROR] File “_plugin.py”, line 347, in call
[2024-07-26 10:35:26.861] [ERROR] File “_plugin.py”, line 305, in _new_func
[2024-07-26 10:35:26.861] [ERROR] File “/root/.obd/plugins/oceanbase-ce/4.2.2.0/connect.py”, line 625, in connect
[2024-07-26 10:35:26.862] [ERROR] cursor = Cursor(ip=server.ip, port=server_config[‘mysql_port’], tenant=’’, password=password if password is not None else ‘’, stdio=stdio)
[2024-07-26 10:35:26.862] [ERROR] File “_stdio.py”, line 908, in wrapper
[2024-07-26 10:35:26.862] [ERROR] File “/root/.obd/plugins/oceanbase-ce/4.2.2.0/connect.py”, line 517, in init
[2024-07-26 10:35:26.862] [ERROR] self._connect()
[2024-07-26 10:35:26.862] [ERROR] File “/root/.obd/plugins/oceanbase-ce/4.2.2.0/connect.py”, line 547, in _connect
[2024-07-26 10:35:26.862] [ERROR] self.db = mysql.connect(host=self.ip, user=self.user, port=int(self.port), password=str(self.password),
[2024-07-26 10:35:26.862] [ERROR] File “pymysql/connections.py”, line 353, in init
[2024-07-26 10:35:26.862] [ERROR] File “pymysql/connections.py”, line 633, in connect
[2024-07-26 10:35:26.863] [ERROR] File “pymysql/connections.py”, line 907, in _request_authentication
[2024-07-26 10:35:26.863] [ERROR] File “pymysql/connections.py”, line 725, in _read_packet
[2024-07-26 10:35:26.863] [ERROR] File “pymysql/protocol.py”, line 221, in raise_for_error
[2024-07-26 10:35:26.863] [ERROR] File “pymysql/err.py”, line 143, in raise_mysql_exception
[2024-07-26 10:35:26.864] [ERROR] pymysql.err.OperationalError: (8001, ‘Server is initializing’)
[2024-07-26 10:35:26.864] [ERROR]
[2024-07-26 10:35:29.958] [ERROR] OBD-1006: Failed to connect to oceanbase-ce
[2024-07-26 10:35:29.959] [DEBUG] – sub connect ref count to 0
[2024-07-26 10:35:29.959] [DEBUG] – export connect
[2024-07-26 10:35:29.959] [ERROR] OBD-1005: Some of the servers in the cluster have been stopped
[2024-07-26 10:35:29.966] [INFO] See OceanBase分布式数据库-海量数据 笔笔算数 .
[2024-07-26 10:35:29.966] [INFO] Trace ID: ff5fc01c-4af6-11ef-9001-0050568c4d06
[2024-07-26 10:35:29.967] [INFO] If you want to view detailed obd logs, please run: obd display-trace ff5fc01c-4af6-11ef-9001-0050568c4d06

然后我多次重启一样报错,但是ps -ef|grep observer有进程
[root@localhost log]# ps -ef|grep observer
root 6549 1 99 10:30 ? 00:21:12 /data/data/oceanbase-ce/bin/observer -p 2881
root 8388 3433 0 10:49 pts/1 00:00:00 grep observer
[root@localhost log]# obclient -uroot -p -h172.17.9.29 -P2881
Enter password:
ERROR 8001 (08004): Server is initializing
[root@localhost log]# obclient -uroot -p’E0QIJt7RE2z0sxvtvyJZ’ -h172.17.9.29 -P2881
ERROR 8001 (08004): Server is initializing
[root@localhost log]#
日志见附件
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):r

【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

可以上传一下observer.log日志

试下操作系统也重启下呢

给他kill掉 重新启动集群或者重新启动服务器

请提供下 observer.log,obd日志 以及使用obdiag分析下日志
– 在线分析最近一小时的日志,该指令执行的时候会从远程主机上拉取最近一小时的日志进行分析,诊断出出现过的错误
obdiag analyze log --since 1h
https://www.oceanbase.com/docs/common-obdiag-cn-1000000001102497

已经试过了,还是一样的

试过,不行

日志在附件,请大佬帮忙分析

日志在附件,请大佬帮忙分析

ERROR try_recycle_blocks (palf_env_impl.cpp:784) [6930][T1_PalfGC][T1][Y0-0000000000000000-0-0] [lt=31][errcode=-4264] Log out of disk space(msg=“log disk space is almost full”, ret=-4264, total_size(MB)=2048, used_size(MB)=1971, used_percent(%)=96, warn_size(MB)=1638, warn_percent(%)=80, limit_size(MB)=1945, limit_percent(%)=95, total_unrecyclable_size_byte(MB)=1907, maximum_used_size(MB)=1971, maximum_log_stream=1, oldest_log_stream=1, oldest_scn={val:1721891993914063442, v:0}, in_shrinking=false)
clog空间满了,看下存储空间,日志空间才给2G,太小了

1 个赞

怎么调整clog大小的,这个地方哪个参数是clog的?

df -h看下



只有一个sys租户

看下配置文件参数的值。
strings /安装路径/etc/observer.config.bin

observer_id=1
local_ip=127.0.0.1
all_server_list=127.0.0.1:2882
_lcl_op_interval=0ms
__min_full_resource_pool_memory=1073741824
log_disk_size=50360M
min_observer_version=4.3.0.1
memstore_limit_percentage=60
enable_syslog_recycle=True
enable_syslog_wf=False
max_syslog_file_count=4
syslog_level=WDIAG
cluster_id=1715159113
rootservice_list=127.0.0.1:2882:2881
cpu_count=16
system_memory=2124M
memory_limit=41440M
zone=zone1
devname=lo
mysql_port=2881
rpc_port=2882
datafile_maxsize=4192000M
datafile_next=20480M
datafile_size=20480M
data_dir=/data/data/oceanbase-ce/store
major_compact_trigger=10
minor_compact_trigger=2
compaction_mid_thread_score=10
compatible=4.3.0.1
writing_throttling_trigger_percentage=80
freeze_trigger_percentage=10
cpu_quota_concurrency=10

空间看没啥问题, 如何解决 Log out of disk space 问题-OceanBase知识库 按照这个文档的方法试下呢

还是不行,提示ob clog disk hang

[root@localhost oceanbase-ce]# ./bin/observer -o “log_disk_size=180G,log_disk_utilization_limit_threshold=98”
./bin/observer -o log_disk_size=180G,log_disk_utilization_limit_threshold=98
optstr: log_disk_size=180G,log_disk_utilization_limit_threshold=98
[root@localhost oceanbase-ce]# ps -ef|grep observer
root 8960 1 99 14:24 ? 00:01:05 ./bin/observer -o log_disk_size=180G,log_disk_utilization_limit_threshold=98
root 9707 7252 0 14:25 pts/2 00:00:00 grep observer
[root@localhost oceanbase-ce]# mysql -uroot -p’Chin@i23$5oKM’ -P2881 -h172.17.9.29
ERROR 6325 (HY000): ob clog disk hang
[root@localhost oceanbase-ce]#

observer.log里面还是有满的提示
[2024-07-26 14:25:07.694598] ERROR issue_dba_error (ob_log.cpp:1891) [9307][T1_DDL_KV_MERGE][T1][YB427F000001-00061E2092CD6D9B-0-0] [lt=31][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4016, file=“ob_index_block_builder.cpp”, line_no=1064, info=“fail to close sstable index builder”)
[2024-07-26 14:25:08.593076] ERROR try_recycle_blocks (palf_env_impl.cpp:784) [9267][T1_PalfGC][T1][Y0-0000000000000000-0-0] [lt=55][errcode=-4264] Log out of disk space(msg=“log disk space is almost full”, ret=-4264, total_size(MB)=2048, used_size(MB)=1971, used_percent(%)=96, warn_size(MB)=1638, warn_percent(%)=80, limit_size(MB)=2007, limit_percent(%)=98, total_unrecyclable_size_byte(MB)=1907, maximum_used_size(MB)=1971, maximum_log_stream=1, oldest_log_stream=1, oldest_scn={val:1721891993914063442, v:0}, in_shrinking=false)
有啥办法能提升这个默认的2G设置呢?

你用的啥存储,检查下存储和文件系统有什么问题吗

麒麟操作系统,一般的存储。操作系统的message也没啥报错。如下,这个默认的clog=2G,是否可以改大点,如果是大事务,磁盘不太行的话,这个是不是刷盘就满了?
Jul 26 14:31:56 localhost sshd[10155]: mm_audit_run_command entering command /usr/libexec/openssh/sftp-server -l INFO -f AUTH
Jul 26 14:31:56 localhost sftp-server[10174]: session opened for local user appadmin from [10.100.4.39]
Jul 26 14:32:08 localhost systemd[1]: kylin-kms-activation.service: Service RestartSec=30s expired, scheduling restart.
Jul 26 14:32:08 localhost systemd[1]: kylin-kms-activation.service: Scheduled restart job, restart counter is at 64.
Jul 26 14:32:08 localhost systemd[1]: Stopped run kylin_kms_daemon at boot time.
Jul 26 14:32:08 localhost systemd[1]: Started run kylin_kms_daemon at boot time.
Jul 26 14:32:27 localhost systemd[1]: systemd-hostnamed.service: Succeeded.
Jul 26 14:33:08 localhost systemd[1]: kylin-kms-activation.service: Main process exited, code=exited, status=255/EXCEPTION
Jul 26 14:33:08 localhost systemd[1]: kylin-kms-activation.service: Failed with result ‘exit-code’.
Jul 26 14:33:39 localhost systemd[1]: kylin-kms-activation.service: Service RestartSec=30s expired, scheduling restart.
Jul 26 14:33:39 localhost systemd[1]: kylin-kms-activation.service: Scheduled restart job, restart counter is at 65.
Jul 26 14:33:39 localhost systemd[1]: Stopped run kylin_kms_daemon at boot time.
Jul 26 14:33:39 localhost systemd[1]: Started run kylin_kms_daemon at boot time.
Jul 26 14:34:39 localhost systemd[1]: kylin-kms-activation.service: Main process exited, code=exited, status=255/EXCEPTION
Jul 26 14:34:39 localhost systemd[1]: kylin-kms-activation.service: Failed with result ‘exit-code’.
Jul 26 14:35:09 localhost systemd[1]: kylin-kms-activation.service: Service RestartSec=30s expired, scheduling restart.
Jul 26 14:35:09 localhost systemd[1]: kylin-kms-activation.service: Scheduled restart job, restart counter is at 66.
Jul 26 14:35:09 localhost systemd[1]: Stopped run kylin_kms_daemon at boot time.

最好是SSD的,要调整租户资源只能集群在线调整,你这都进不了集群