observer进程不在了，如何查看原因

AntTech_ZGW8AE · 2025 年8 月 12 日 14:39

今天在阿里云的3台ecs上安装了observer组成一个集群，刚才发现1和2都不在线了，进ecs查看，observer进程不存在了，请问怎么看原因在哪里？

辞霜 · 2025 年8 月 12 日 14:41

cd ~/oceanbase/log下
看一下observer日志，麻烦提供一份附件

AntTech_ZGW8AE · 2025 年8 月 12 日 14:43

看起来有不少log文件，都需要吗？

辞霜 · 2025 年8 月 12 日 14:46

如果进程不在了日志应该是停写的，只需要observer.log。

AntTech_ZGW8AE · 2025 年8 月 12 日 14:52

oserver.log.zip (1.3 MB)
请看附件

辞霜 · 2025 年8 月 12 日 14:59

日志存在RPC问题，看一下3个节点的时钟差是不是很大

AntTech_ZGW8AE · 2025 年8 月 12 日 15:05

用date命令看，3个节点时间差不多。系统上centos 7.9，应该默认有时间同步机制吧？
还有其他办法看吗?

小鸟 · 2025 年8 月 12 日 15:06

学习

蕾子 · 2025 年8 月 12 日 15:38

学到了

AntTech_ZGW8AE · 2025 年8 月 12 日 15:51

刚刚节点3也挂了，log如下，麻烦看看：
observer3.log.zip (2.6 MB)

辞霜 · 2025 年8 月 12 日 15:56

看一下三台机器的时钟源。ob要求节点时间差异不能大于2s

AntTech_ZGW8AE · 2025 年8 月 12 日 16:30

查看这3台ecs，时间同步程序在运行：
chrony 642 1 0 Aug11 ? 00:00:00 /usr/sbin/chronyd

配置的是阿里云的源：
[root@iZuf6fiwi2hpo62d8johdxZ log]# cat /etc/chrony.conf

Use Alibaba NTP server

server ntp.cloud.aliyuncs.com minpoll 4 maxpoll 10 iburst
server ntp.aliyun.com minpoll 4 maxpoll 10 iburst
server ntp1.aliyun.com minpoll 4 maxpoll 10 iburst
server ntp10.cloud.aliyuncs.com minpoll 4 maxpoll 10 iburst
server ntp11.cloud.aliyuncs.com minpoll 4 maxpoll 10 iburst
server ntp12.cloud.aliyuncs.com minpoll 4 maxpoll 10 iburst
server ntp2.aliyun.com minpoll 4 maxpoll 10 iburst
server ntp3.aliyun.com minpoll 4 maxpoll 10 iburst
server ntp4.aliyun.com minpoll 4 maxpoll 10 iburst
server ntp5.aliyun.com minpoll 4 maxpoll 10 iburst
server ntp6.aliyun.com minpoll 4 maxpoll 10 iburst
server ntp7.cloud.aliyuncs.com minpoll 4 maxpoll 10 iburst
server ntp8.cloud.aliyuncs.com minpoll 4 maxpoll 10 iburst
server ntp9.cloud.aliyuncs.com minpoll 4 maxpoll 10 iburst

stratumweight 0.05

driftfile /var/lib/chrony/drift

rtcsync

makestep 10 3

#allow 192.168/16

bindcmdaddress 127.0.0.1
bindcmdaddress ::1

noclientlog

logchange 0.5

logdir /var/log/chrony
#log measurements statistics tracking

时间应该没有问题

Sunshining · 2025 年8 月 12 日 16:33

学习了！很多问题应该都是集群时钟不一致所导致的，时间一致问题确实任何集群类应用面临的主要问题，这个需要运维人员用各种工具来保证一下。

AntTech_ZGW8AE · 2025 年8 月 12 日 16:51

挂了的时候，有一次，走ocp点重启提示无法完成，重启了这台ecs才自动起来了，我想问一下，手工启动observer的命令是什么？谢谢

辞霜 · 2025 年8 月 12 日 16:54

cd /home/admin/oceanbase && /home/admin/oceanbase/bin/observer

David-武生 · 2025 年8 月 17 日 11:15

查看observer日志，你的NTP服务搭建集群之前都配置好了吗？