V4.4.2 扩容OBServer失败

【 OB or 其他组件 】OCP针对当前集群新增observer节点
【 使用版本 】OB V4.4.2.1 OCP V4.4.2
【问题描述】部署过程中,启动obshell失败,任务中止。

目的:扩容三台observer (rocky linux 9)

subtask_12004632.log (226.6 KB)

6 个赞

ocp的版本是哪一个 ocp-server.log日志发一下 看看

4 个赞

OCP V4.4.2

我尝试手动启动obshell

export OB_ROOT_PASSWORD="xxxxx*";./bin/obshell admin start --ip x.x.x.x --port 2886



[ERROR] Code: Agent.Rebuild.VersionNotSame, Message: Take over or rebuild failed: Agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814
[ERROR] Code: Agent.Daemon.StartFailed, Message: Daemon start failed: obshell server exited with code 22, please check obshell.log for more details

我对比了待扩容节点(任务失败节点)与正常运行的observer节点的obshell版本

待扩容节点: 
./bin/obshell -V
OBShell 4.4.0.0 (for OceanBase_CE)

REVISION: 32026010814-b8061fb339fe52a21d1b104261df83190a2ea958
BUILD_BRANCH: HEAD
BUILD_TIME: Jan 08 2026 14:24:01 UTC
BUILD_FLAGS: release
BUILD_INFO: 
正在运行的节点
./bin/obshell -V
OBShell 4.4.1.1 (for OceanBase_CE)

REVISION: 32026031914-031aba36bc352cb3ea84e2d202df06044c3e20c9
BUILD_BRANCH: HEAD
BUILD_TIME: Mar 19 2026 15:00:06 UTC
BUILD_FLAGS: release
BUILD_INFO: 

3 个赞

SOS

3 个赞

我看到的问题是sync all tenant information ,表示你扩容zone 的时候,同步租户信息,是否是租户信息数据多,2个半小时没有同步完而报错呢!
解决方法

  1. 扩容时,不同步租户信息,扩容完成后,再扩容租户!
  2. 看看有没有设置同步租户的时间,设置大一点,不要超时!
4 个赞

ocp-server.log日志 能提供一下么?看着是obshell启动很失败obshell的日志能发一下么?

2 个赞

ocp-server.zip (929.7 KB)

刚才重试任务,把ocp的日志采集出来了。

obshell的日志在哪里

3 个赞

observer的log目录里有个log_obshell目录,里面的就是 obshell 的日志

2 个赞

扩容失败节点上的截取:

2026-05-28T14:17:47.088 INFO  [407862] [F000000000000000] [server/takeover.go:41] start to take over or rebuild
2026-05-28T14:17:47.122 INFO  [407862] [F000000000000000] [oceanbase/builder.go:126] create database ocs succeed
2026-05-28T14:17:50.348 INFO  [407862] [F000000000000000] [oceanbase/builder.go:196] auto migrate ob tables succeed
2026-05-28T14:17:50.393 ERROR [407862] [F000000000000000] [ob/rebuild.go:39] agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814
2026-05-28T14:17:50.393 ERROR [407862] [F000000000000000] [server/takeover.go:33] rebuild failed fields: error="Agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814"
2026-05-28T14:17:50.393 ERROR [407862] [F000000000000000] [runtime/asm_amd64.s:1700] take over or rebuild failed fields:, error="Agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814"
2026-05-28T14:17:50.394 INFO  [407862] [F000000000000000] [process/exit.go:53] exit with code 22: [ERROR] Code: Agent.Rebuild.VersionNotSame, Message: Take over or rebuild failed: Agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814
2 个赞

目前重试以后 ocp上截图看看 是那里报错 上次的任务里看是obshell启动失败
ocp-server的报错 是时钟检验有问题呀

2 个赞

2 个赞

2026-05-28T14:17:50.393 ERROR [407862] [F000000000000000] [ob/rebuild.go:39] agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814
2026-05-28T14:17:50.393 ERROR [407862] [F000000000000000] [server/takeover.go:33] rebuild failed fields: error=“Agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814”
2026-05-28T14:17:50.393 ERROR [407862] [F000000000000000] [runtime/asm_amd64.s:1700] take over or rebuild failed fields:, error=“Agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814”
看着报错agent的版本不一致 上面发的确实是obshell日志吧 obshell的整个日志 都发一下

也检查一下时钟是否一致

1 个赞

obshell.log (150.1 KB)

时钟是一致的,我当前集群运行正常,每个节点都做了ntpdate或者是chronyc -a makestep 定时任务同步,只是clockdiff总是失败。

obshell 版本与集群元数据不一致导致的启动失败

本机 obshell 二进制已是 4.4.0.0了,把本机 obshell 改回 4.2.5.1(与 all_agents 一致) 再 obshell agent start

– sys 租户

SELECT * FROM ocs.all_agents;

可以看看observer的进程是否启动

1 个赞

图片

我使用 obshell -V 看版本跟查出来的不一致。为什么我扩容这台的obshell版本这么老,这个IP曾经被我缩容过。当前机器是全新的,用老的IP扩容进来,是不是遗留的历史记录导致的bug?

如何替换obshell? 从正常运行的节点拷贝obshell到扩容失败的节点上吗?

正常节点:

# ./obshell -V
OBShell 4.4.1.1 (for OceanBase_CE)

REVISION: 32026031914-031aba36bc352cb3ea84e2d202df06044c3e20c9
BUILD_BRANCH: HEAD
BUILD_TIME: Mar 19 2026 15:00:06 UTC
BUILD_FLAGS: release
BUILD_INFO: 

Copyright (c) 2011-present OceanBase Inc.

为什么SQL查询出来的是4.3.1.0 ?

1 个赞

666

在 home 目录下对 agent 升级(版本号按实际包填写)你试一下 这样可以升级不
${home_path}/bin/obshell agent upgrade -d <upgrade_pkg_dir> -V 4.4.0.0-32026010814

upgrade_pkg_dir 是升级包的路径

哪来的升级包,我是ocp白屏升级的,直接拷贝正常运行的节点obshell过来,重试任务还是失败

下载一个obshell包 你看看其他的obshell版本是多少 升级到对应的版本就可以了 现在不是版本不一致么?

不要手工只换二进制,应走升级流程,保证 所有节点 obshell 版本一致,并更新元数据:

请原谅我找不到obshell的下载地址
https://www.oceanbase.com/softwarecenter

ob-all-in-one 4.4.2 bp1里面也没有

图片