集群升级失败进入upgrading状态

【 使用环境 】测试环境
【 OB or 其他组件 】obd
【 使用版本 】社区版 4.2.1.10
【问题描述】将社区版 4.2.1.10 升级到 4.3.5.1失败后集群进入upgrading状态,无法重启集群,无法新增数据表,希望解除upgrading状态或者有方法能升级成功
【复现路径】问题出现前后相关操作
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):

升级报错

[root@iZm9w018wdhmz6ihxhpjhgZ ~]# obd cluster upgrade luoyang -c oceanbase-ce -V 4.3.5.1 --usable=8826bc816ae660198f9ca5fd7e96d93c1ce4fc84
Get local repositories and plugins ok
Open ssh connection ok
Get deployment connections ok
Get standbys info ok
cluster scenario: None
Start observer ok
observer program health check ok
Connect to observer 99.99.128.97:2881 ok
cluster scenario: None
[ERROR] 99.99.128.97 obshell failed

See https://www.oceanbase.com/product/ob-deployer/error-codes .
Trace ID: 2f1b894c-0ead-11f0-affa-00163e000084
If you want to view detailed obd logs, please run: obd display-trace 2f1b894c-0ead-11f0-affa-00163e000084

obshell 日志


2025/04/01 11:55:57 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error dial tcp 127.0.0.1:2881: connect: connection refused

2025/04/01 11:55:59 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:01 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:03 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:05 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:07 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:09 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:11 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:13 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:15 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:17 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:19 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:21 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:23 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:25 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:27 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:29 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:31 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:33 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:35 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:37 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:39 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:41 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:43 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:45 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:47 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:49 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 8001 (08004): Server is initializing

2025/04/01 11:56:51 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:75
[error] failed to initialize database, got error Error 1049 (42000): Unknown database 'ocs'

2025/04/01 11:56:52 /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/agent/repository/db/oceanbase/builder.go:116 Error 4179 (HY000): Operation not allowed now
[0.973ms] [rows:0] CREATE DATABASE IF NOT EXISTS ocs READ WRITE

【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

【备注】基于 LLM 和开源文档 RAG 的论坛小助手已开放测试,在发帖时输入 [@论坛小助手] 即可召唤小助手,欢迎试用!

1 个赞

日志采集场景: 集群无法连接
日志采集命令: obdiag gather scene run --scene=observer.cluster_down

另外,为了确定具体的问题,在此需要您提供一些信息:(下面提出几个问题)

  1. 升级失败时是否有具体的错误信息或日志提示?如果有,请提供相关日志内容。
  2. 在升级过程中,是否对集群进行了其他操作(如扩容、缩容、修改配置等)?
  3. 当前集群的状态是否可以通过 obd cluster status 命令查看?如果可以,请提供输出结果。
  4. 是否尝试过通过 OCP 或手动方式重启集群?如果有,请描述具体的操作步骤和结果。
  5. 集群的配置文件(如 config.yaml)在升级前后是否有改动?如果有,请说明改动内容。

附上敏捷诊断工具 obdiag 使用帮助链接

1 个赞

学到了!!

学到啥了

当前42x版本ob暂时不支持升级到43x版本

那怎么解除 upgrading 状态呢,现在卡在 upgrading 状态下很多操作做不了

cd ~/.obd/cluster/xxx/
vi .data
status改为STATUS_RUNNING
config_status: UNCHNAGE

操作完重启集群起不来了

[2025-04-01 14:35:12.789] [DEBUG] -- exited code 0
[2025-04-01 14:35:12.796] [DEBUG] - sub general_check ref count to 0
[2025-04-01 14:35:12.796] [DEBUG] - export general_check
[2025-04-01 14:35:12.796] [DEBUG] - plugin obagent-py_script_general_check-1.3.0 result: True
[2025-04-01 14:35:12.796] [DEBUG] - share lock `/root/.obd/lock/mirror_and_repo`, count 8
[2025-04-01 14:35:12.798] [DEBUG] - Searching parameter_pre plugin for components ...
[2025-04-01 14:35:12.799] [DEBUG] - Searching parameter_pre plugin for ocp-express-4.2.0-100000042023073111.el7-ccec08112a29067633797d20685b6e6d70e890d9
[2025-04-01 14:35:12.799] [DEBUG] - Found for ocp-express-py_script_parameter_pre-1.0 for ocp-express-4.2.0
[2025-04-01 14:35:12.799] [DEBUG] - Call plugin ocp-express-py_script_parameter_pre-1.0 for ocp-express-4.2.0-100000042023073111.el7-ccec08112a29067633797d20685b6e6d70e890d9
[2025-04-01 14:35:12.799] [DEBUG] - import parameter_pre
[2025-04-01 14:35:13.375] [DEBUG] - add parameter_pre ref count to 1
[2025-04-01 14:35:13.380] [ERROR] ocp-express-py_script_parameter_pre-1.0 RuntimeError: 'ocp_meta_tenant'
[2025-04-01 14:35:13.380] [ERROR] Traceback (most recent call last):
[2025-04-01 14:35:13.380] [ERROR]   File "core.py", line 2089, in start_cluster
[2025-04-01 14:35:13.380] [ERROR]   File "core.py", line 2135, in _start_cluster
[2025-04-01 14:35:13.380] [ERROR]   File "core.py", line 228, in run_workflow
[2025-04-01 14:35:13.380] [ERROR]   File "core.py", line 270, in run_plugin_template
[2025-04-01 14:35:13.380] [ERROR]   File "core.py", line 315, in call_plugin
[2025-04-01 14:35:13.380] [ERROR]   File "_plugin.py", line 347, in __call__
[2025-04-01 14:35:13.380] [ERROR]   File "_plugin.py", line 304, in _new_func
[2025-04-01 14:35:13.380] [ERROR]   File "/root/.obd/plugins/ocp-express/1.0/parameter_pre.py", line 227, in parameter_pre
[2025-04-01 14:35:13.380] [ERROR]     server_config[prefix + 'tenant'][key.replace(prefix, '', 1)] = server_config[key]
[2025-04-01 14:35:13.380] [ERROR] KeyError: 'ocp_meta_tenant'
[2025-04-01 14:35:13.380] [ERROR] 

obd cluster xxxx start -c oceanbase-ce。
不要使用ocp-express了,当前已经不进行维护了