【 OB or 其他组件 】OCP针对当前集群新增observer节点
【 使用版本 】OB V4.4.2.1 OCP V4.4.2
【问题描述】部署过程中,启动obshell失败,任务中止。
目的:扩容三台observer (rocky linux 9)
subtask_12004632.log (226.6 KB)
【 OB or 其他组件 】OCP针对当前集群新增observer节点
【 使用版本 】OB V4.4.2.1 OCP V4.4.2
【问题描述】部署过程中,启动obshell失败,任务中止。
目的:扩容三台observer (rocky linux 9)
subtask_12004632.log (226.6 KB)
ocp的版本是哪一个 ocp-server.log日志发一下 看看
OCP V4.4.2
我尝试手动启动obshell
export OB_ROOT_PASSWORD="xxxxx*";./bin/obshell admin start --ip x.x.x.x --port 2886
[ERROR] Code: Agent.Rebuild.VersionNotSame, Message: Take over or rebuild failed: Agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814
[ERROR] Code: Agent.Daemon.StartFailed, Message: Daemon start failed: obshell server exited with code 22, please check obshell.log for more details
我对比了待扩容节点(任务失败节点)与正常运行的observer节点的obshell版本
待扩容节点:
./bin/obshell -V
OBShell 4.4.0.0 (for OceanBase_CE)
REVISION: 32026010814-b8061fb339fe52a21d1b104261df83190a2ea958
BUILD_BRANCH: HEAD
BUILD_TIME: Jan 08 2026 14:24:01 UTC
BUILD_FLAGS: release
BUILD_INFO:
正在运行的节点
./bin/obshell -V
OBShell 4.4.1.1 (for OceanBase_CE)
REVISION: 32026031914-031aba36bc352cb3ea84e2d202df06044c3e20c9
BUILD_BRANCH: HEAD
BUILD_TIME: Mar 19 2026 15:00:06 UTC
BUILD_FLAGS: release
BUILD_INFO:
SOS
我看到的问题是sync all tenant information ,表示你扩容zone 的时候,同步租户信息,是否是租户信息数据多,2个半小时没有同步完而报错呢!
解决方法
ocp-server.log日志 能提供一下么?看着是obshell启动很失败obshell的日志能发一下么?
observer的log目录里有个log_obshell目录,里面的就是 obshell 的日志
扩容失败节点上的截取:
2026-05-28T14:17:47.088 INFO [407862] [F000000000000000] [server/takeover.go:41] start to take over or rebuild
2026-05-28T14:17:47.122 INFO [407862] [F000000000000000] [oceanbase/builder.go:126] create database ocs succeed
2026-05-28T14:17:50.348 INFO [407862] [F000000000000000] [oceanbase/builder.go:196] auto migrate ob tables succeed
2026-05-28T14:17:50.393 ERROR [407862] [F000000000000000] [ob/rebuild.go:39] agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814
2026-05-28T14:17:50.393 ERROR [407862] [F000000000000000] [server/takeover.go:33] rebuild failed fields: error="Agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814"
2026-05-28T14:17:50.393 ERROR [407862] [F000000000000000] [runtime/asm_amd64.s:1700] take over or rebuild failed fields:, error="Agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814"
2026-05-28T14:17:50.394 INFO [407862] [F000000000000000] [process/exit.go:53] exit with code 22: [ERROR] Code: Agent.Rebuild.VersionNotSame, Message: Take over or rebuild failed: Agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814
2026-05-28T14:17:50.393 ERROR [407862] [F000000000000000] [ob/rebuild.go:39] agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814
2026-05-28T14:17:50.393 ERROR [407862] [F000000000000000] [server/takeover.go:33] rebuild failed fields: error=“Agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814”
2026-05-28T14:17:50.393 ERROR [407862] [F000000000000000] [runtime/asm_amd64.s:1700] take over or rebuild failed fields:, error=“Agent version is not the same, agent version in all_agents: 4.2.5.1-12025010214, agent version now: 4.4.0.0-32026010814”
看着报错agent的版本不一致 上面发的确实是obshell日志吧 obshell的整个日志 都发一下
也检查一下时钟是否一致
obshell.log (150.1 KB)
时钟是一致的,我当前集群运行正常,每个节点都做了ntpdate或者是chronyc -a makestep 定时任务同步,只是clockdiff总是失败。
obshell 版本与集群元数据不一致导致的启动失败
本机 obshell 二进制已是 4.4.0.0了,把本机 obshell 改回 4.2.5.1(与 all_agents 一致) 再 obshell agent start
– sys 租户
SELECT * FROM ocs.all_agents;
可以看看observer的进程是否启动

我使用 obshell -V 看版本跟查出来的不一致。为什么我扩容这台的obshell版本这么老,这个IP曾经被我缩容过。当前机器是全新的,用老的IP扩容进来,是不是遗留的历史记录导致的bug?
如何替换obshell? 从正常运行的节点拷贝obshell到扩容失败的节点上吗?
正常节点:
# ./obshell -V
OBShell 4.4.1.1 (for OceanBase_CE)
REVISION: 32026031914-031aba36bc352cb3ea84e2d202df06044c3e20c9
BUILD_BRANCH: HEAD
BUILD_TIME: Mar 19 2026 15:00:06 UTC
BUILD_FLAGS: release
BUILD_INFO:
Copyright (c) 2011-present OceanBase Inc.
为什么SQL查询出来的是4.3.1.0 ?
666
在 home 目录下对 agent 升级(版本号按实际包填写)你试一下 这样可以升级不
${home_path}/bin/obshell agent upgrade -d <upgrade_pkg_dir> -V 4.4.0.0-32026010814
upgrade_pkg_dir 是升级包的路径
哪来的升级包,我是ocp白屏升级的,直接拷贝正常运行的节点obshell过来,重试任务还是失败
下载一个obshell包 你看看其他的obshell版本是多少 升级到对应的版本就可以了 现在不是版本不一致么?
不要手工只换二进制,应走升级流程,保证 所有节点 obshell 版本一致,并更新元数据: