OB升级失败:port_check.py 中的 servers_port 变量 is None

执行版本升级:

obd cluster upgrade cluster_name -c oceanbase-ce -V 4.3.5.0 --usable=3bf011e10e446a947e15649f193dd75a766fc8a6c28034279d49d478faf54d2e

版本升级的起始版本:

| name         | version | release                | arch   | md5                                      | mark  |
+--------------+---------+------------------------+--------+------------------------------------------+-------+
| oceanbase-ce | 4.3.4.1 | 101000032024121814.el7 | x86_64 | fa56bff3614d8bcbb2a0c5f4c03f99979cc25d6a | start |
| oceanbase-ce | 4.3.5.0 | 100000202024123117.el7 | x86_64 | 48b61655aaa13e9b01b722928d1979c76b41937e | dest  |
+--------------+---------+------------------------+--------+------------------------------------------+-------+

报错日志:

cluster scenario: olap
Start observer ok
observer program health check ok
Connect to observer 172.25.133.51:2881 ok
Exec upgrade_checker.py x   <-- 报错
See https://www.oceanbase.com/product/ob-deployer/error-codes .
Trace ID: 7c37a610-fa86-11ef-8642-e8611f37b696

详细日志:

[2025-03-06 20:37:48.685] [ERROR] Traceback (most recent call last):
[2025-03-06 20:37:48.685] [ERROR]   File "core.py", line 3223, in upgrade_cluster
[2025-03-06 20:37:48.685] [ERROR]   File "core.py", line 233, in run_workflow
[2025-03-06 20:37:48.685] [ERROR]   File "core.py", line 275, in run_plugin_template
[2025-03-06 20:37:48.685] [ERROR]   File "core.py", line 320, in call_plugin
[2025-03-06 20:37:48.685] [ERROR]   File "_plugin.py", line 352, in __call__
[2025-03-06 20:37:48.685] [ERROR]   File "_plugin.py", line 309, in _new_func
[2025-03-06 20:37:48.685] [ERROR]   File "/root/.obd/plugins/oceanbase-ce/4.2.1.4/upgrade.py", line 565, in upgrade
[2025-03-06 20:37:48.685] [ERROR]     if upgrader.run():
[2025-03-06 20:37:48.685] [ERROR]   File "/root/.obd/plugins/oceanbase-ce/4.2.1.4/upgrade.py", line 194, in run
[2025-03-06 20:37:48.685] [ERROR]     if not self.process[self.process_index]():
[2025-03-06 20:37:48.685] [ERROR]   File "/root/.obd/plugins/oceanbase-ce/4.2.1.4/upgrade.py", line 534, in take_over
[2025-03-06 20:37:48.685] [ERROR]     if not self.run_workflow(obshell_port_check_workflows, repository, self.cluster_config, **{repository.name: {"upgrade_check": True}}):
[2025-03-06 20:37:48.685] [ERROR]   File "/root/.obd/plugins/oceanbase-ce/4.2.1.4/upgrade.py", line 131, in run_workflow
[2025-03-06 20:37:48.685] [ERROR]     return self._run_workflow(workflows, repositories=[repository], **kwargs)
[2025-03-06 20:37:48.685] [ERROR]   File "core.py", line 233, in run_workflow
[2025-03-06 20:37:48.685] [ERROR]   File "core.py", line 275, in run_plugin_template
[2025-03-06 20:37:48.685] [ERROR]   File "core.py", line 320, in call_plugin
[2025-03-06 20:37:48.685] [ERROR]   File "_plugin.py", line 352, in __call__
[2025-03-06 20:37:48.685] [ERROR]   File "_plugin.py", line 309, in _new_func
[2025-03-06 20:37:48.685] [ERROR]   File "/root/.obd/plugins/oceanbase-ce/4.2.1.4/obshell_port_check.py", line 40, in obshell_port_check
[2025-03-06 20:37:48.685] [ERROR]     ports = servers_port[ip]
[2025-03-06 20:37:48.685] [ERROR] TypeError: 'NoneType' object is not subscriptable

报错代码:

def obshell_port_check(plugin_context, upgrade_check=False, *args, **kwargs):
    cluster_config = plugin_context.cluster_config
    clients = plugin_context.clients
    stdio = plugin_context.stdio
    critical = plugin_context.get_variable('critical')
    servers_port = plugin_context.get_variable('servers_port')
    port_check = upgrade_check or plugin_context.get_variable('port_check')
    if not port_check:
        return plugin_context.return_true()
    
    stdio.info("plugin_context={}".format(plugin_context.__dict__))   # 手动添加日志输出
    stdio.info("servers_port={}".format(servers_port))  # 手动添加日志输出

    for server in cluster_config.servers:
        ip = server.ip
        
        stdio.info("Current IP={}".format(ip))   # 手动添加日志输出
        
        client = clients[server]
        server_config = cluster_config.get_server_conf_with_default(server)
        ports = servers_port[ip]  # <-- 【此处 servers_port 为 None】
        if port_check:
            stdio.verbose('%s port check' % server)
            port = int(server_config.get('obshell_port'))
            if port in ports:
                critical(
                    server,
                    'port',
                    err.EC_CONFIG_CONFLICT_PORT.format(server1=server, port=port, server2=ports[port]['server'], key=ports[port]['key']),
                    [err.SUG_PORT_CONFLICTS.format()]
                )
                continue
            if get_port_socket_inode(client, port):
                critical(server, 'port', err.EC_CONFLICT_PORT.format(server=ip, port=port), [err.SUG_USE_OTHER_PORT.format()])

    return plugin_context.return_true()

日志输出结果:

[2025-03-06 22:04:33.478] [INFO] plugin_context={'namespace': <_plugin.PluginContextNamespace object at 0x7f0caa766400>, 'namespaces': {'oceanbase-ce': <_plugin.PluginContextNamespace object at 0x7f0caa766400>}, 'deploy_name': 'cluster_name', 'deploy_status': <DeployStatus.STATUS_UPRADEING: 'upgrading'>, 'repositories': [<_repository.Repository object at 0x7f0caa8c53a0>], 'plugin_name': 'obshell_port_check', 'components': dict_keys(['oceanbase-ce', 'obagent', 'ocp-express']), 'clients': {<172.*.*.*>: <_plugin.ScriptPlugin.ClientForScriptPlugin object at 0x7f0caa7f49a0>}, 'cluster_config': <_deploy.ClusterConfig object at 0x7f0cab1b5ca0>, 'cmds': ['nrlj'], 'options': <Values at 0x7f0cab1f3e50: {'component': 'oceanbase-ce', 'version': '4.3.5.0', 'skip_check': None, 'usable': '3bf011e10e446a947e15649f193dd75a766fc8a6c28034279d49d478faf54d2e', 'disable': '', 'executer_path': '/usr/obd/lib/executer', 'script_query_timeout': '', 'ignore_standby': None, 'without_parameter': True}>, 'dev_mode': False, 'stdio': <_plugin.SubIO object at 0x7f0caa7f48e0>, 'concurrent_executor': <ssh.ConcurrentExecutor object at 0x7f0caa7f4c40>, '_return': <_plugin.PluginReturn object at 0x7f0caa7f4910>}
[2025-03-06 21:02:57.626] [INFO] servers_port=None
[2025-03-06 21:02:57.626] [INFO] Current IP=172.*.*.*
[2025-03-06 21:02:57.628] [ERROR] oceanbase-ce-py_script_obshell_port_check-4.2.1.4 RuntimeError: 'NoneType' object is not subscriptable

可以看出是 servers_port=None 导致的。

另外,当我回到 OCP EXPRESS 时,当前版本已经变更为了 4.3.5.0,说明版本升级成功了?

希望有大佬能帮忙协助排查一下,万分感谢。

3 个赞

麻烦黑屏查看下ob都版本信息
select @@version;
可以试试重新执行upgrade可继续升级。

1 个赞

感谢您的回复,我不太清楚您说的黑屏是不是指obclient的意思,于是我尝试了下面几种命令查看版本:

obclient(root@nrlj)[nrlj]> select @version;
+----------+
| @version |
+----------+
| NULL     |
+----------+
1 row in set (0.002 sec)

obclient(root@nrlj)[nrlj]> show variables like '%version%';
+--------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| Variable_name                                    | Value                                                                                                            |
+--------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| group_replication_allow_local_lower_version_join | OFF                                                                                                              |
| innodb_version                                   | 5.7.38                                                                                                           |
| ndbinfo_version                                  |                                                                                                                  |
| ndb_version                                      |                                                                                                                  |
| ndb_version_string                               |                                                                                                                  |
| ob_compatibility_version                         | 4.3.4.1                                                                                                          |
| ob_last_schema_version                           | 0                                                                                                                |
| ob_security_version                              | 4.3.4.1                                                                                                          |
| protocol_version                                 | 10                                                                                                               |
| slave_type_conversions                           | ALL_LOSSY                                                                                                        |
| tls_version                                      |                                                                                                                  |
| version                                          | 5.7.25-OceanBase_CE-v4.3.5.0                                                                                     |
| version_comment                                  | OceanBase_CE 4.3.5.0 (r100000202024123117-5d6cb5cbc3f7c1ab6eb22e40abec8e160a8764d5) (Built Dec 31 2024 17:35:01) |
| version_compile_machine                          |                                                                                                                  |
| version_compile_os                               |                                                                                                                  |
| version_tokens_session                           |                                                                                                                  |
+--------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
16 rows in set (0.006 sec)

关于您说的obd升级超时,我昨晚尝试了几次均出现报错。今天服务器进行过一次重启,重启后我执行启动集群出现如下报错,后再次尝试升级命令,集群被成功启动,但升级问题仍未解决。

(base) root@ubuntu109:~/nrlj obd cluster start nrlj
[ERROR] Deploy "nrlj" is upgrading. You could not start an upgrading cluster.
See https://www.oceanbase.com/product/ob-deployer/error-codes .
Trace ID: b0281be8-fb36-11ef-922e-e8611f37b696
1 个赞

obd display-trace 7c37a610-fa86-11ef-8642-e8611f37b696
看看, Exec upgrade_checker.py x看着和下面的报错不是一个地方的,可以发一下这个日志

看看py脚本的报错信息

感谢您的回复以及指出了我提问中的纰漏:报错日志贴出的是先前的日志。
以下是 Trace ID: 7c37a610-fa86-11ef-8642-e8611f37b696 的详细日志内容。
出现该报错的原因是 有租户在进行合并操作。

[2025-03-06 20:30:04.010] [INFO] Exec upgrade_checker.py
[2025-03-06 20:30:04.011] [DEBUG] -- exec oceanbase-ce-4.3.5.0-100000202024123117.el7-48b61655aaa13e9b01b722928d1979c76b41937e upgrade_checker.py
[2025-03-06 20:30:04.037] [DEBUG] -- exec oceanbase-ce-4.3.5.0-100000202024123117.el7-48b61655aaa13e9b01b722928d1979c76b41937e upgrade_checker.py
[2025-03-06 20:30:04.037] [DEBUG] -- local execute: /usr/obd/lib/executer/executer27/bin/executer /tmp/172.*.*.*:2882/48b61655aaa13e9b01b722928d1979c76b41937e/upgrade_checker.py -h 172.*.*.* -P 2881 -u root -p '******'
[2025-03-06 20:30:15.118] [DEBUG] -- exited code 255, error output:
[2025-03-06 20:30:15.118] [DEBUG] Character set '45' is not a compiled character set and is not specified in the '/usr/local/mysql/share/charsets/Index.xml' file
[2025-03-06 20:30:15.118] [DEBUG] Character set '45' is not a compiled character set and is not specified in the '/usr/local/mysql/share/charsets/Index.xml' file
[2025-03-06 20:30:15.118] [DEBUG] sh: /tmp/_MEIIscmU2/libtinfo.so.5: no version information available (required by sh)
[2025-03-06 20:30:15.119] [DEBUG] Traceback (most recent call last):
[2025-03-06 20:30:15.119] [DEBUG]   File "executer27.py", line 47, in <module>
[2025-03-06 20:30:15.119] [DEBUG]   File "/tmp/172.*.*.*:2882/48b61655aaa13e9b01b722928d1979c76b41937e/upgrade_checker.py", line 1059, in <module>
[2025-03-06 20:30:15.119] [DEBUG]     do_check(host, port, user, password, timeout, upgrade_params, cpu_arch)
[2025-03-06 20:30:15.119] [DEBUG]   File "/tmp/172.*.*.*:2882/48b61655aaa13e9b01b722928d1979c76b41937e/upgrade_checker.py", line 1023, in do_check
[2025-03-06 20:30:15.119] [DEBUG]     check_fail_list()
[2025-03-06 20:30:15.119] [DEBUG]   File "/tmp/172.*.*.*:2882/48b61655aaa13e9b01b722928d1979c76b41937e/upgrade_checker.py", line 750, in check_fail_list
[2025-03-06 20:30:15.119] [DEBUG]     raise MyError(error_msg)
[2025-03-06 20:30:15.119] [DEBUG] __main__.MyError: 'upgrade checker failed with 2 reasons: [1 tenant is merging, please check] , [944 tablet is merging, please check] '
[2025-03-06 20:30:15.119] [DEBUG] [22643] Failed to execute script executer27

因此我后续再次尝试更新,不再出现此错误,而是下述错误,也是我此次想提问的内容:

(base) root@ubuntu109:~# obd cluster upgrade nrlj -c oceanbase-ce -V 4.3.5.0 --usable=3bf011e10e446a947e15649f193dd75a766fc8a6c28034279d49d478faf54d2e
Get local repositories and plugins ok
Open ssh connection ok
Get deployment connections ok
Get standbys info ok
cluster scenario: olap
Start observer ok
observer program health check ok
Connect to observer 172.*.*.*:2881 ok
[ERROR] oceanbase-ce-py_script_obshell_port_check-4.2.1.4 RuntimeError: 'NoneType' object is not subscriptable
See https://www.oceanbase.com/product/ob-deployer/error-codes .
Trace ID: 85862528-fd62-11ef-a14f-e8611f37b696

感谢您的帮助。

你好,感谢您将目光注于我的提问,obshell_port_check.py 的详细报错日志我已在提问中贴出,如果还需要其他信息请告诉我~

抱歉,之前看错了。
当前obd版本是多少呢,可以先升级到3.12版本的obd再进行升级,在之前的版本存在过object is not subscriptable这种类似bug。

收到您的回复,我是按照官网教程更新的, 使用 obd 升级 OceanBase 数据库-V4.3.5-OceanBase 数据库文档-分布式数据库使用文档,我记得我进行过 obd update.
刚刚我按您说的再次更新了一下,但更新后版本号没有改变。

查看版本为 3.0.1

(base) root@ubuntu109:~# obd --version
OceanBase Deploy: 3.0.1
REVISION: 3ad60f74e2a69c06d4a2d5db31bae78ea8430d89
BUILD_BRANCH: HEAD
BUILD_TIME: Dec 18 2024 16:33:13OURCE

更新 obd 到 3.1.2

(base) root@ubuntu109:~# obd update
[WARN] Use centos 7 remote mirror repository for ubuntu 18.04
[WARN] Use centos 7 remote mirror repository for ubuntu 18.04
Found a higher version package for OBD
name: ob-deploy
version: 3.1.2
release:1.el7
arch: x86_64
md5: 75a9be0ab6bd723754febe30970a27fd52c5cdd7
size: 180450861
Upgrade successful.
Current version : 3.1.2
Trace ID: f8eb6042-fd8e-11ef-8cd5-e8611f37b696
If you want to view detailed obd logs, please run: obd display-trace f8eb6042-fd8e-11ef-8cd5-e8611f37b696

更新成功,查看版本仍为 3.0.1

(base) root@ubuntu109:~# obd --version
OceanBase Deploy: 3.0.1
REVISION: 3ad60f74e2a69c06d4a2d5db31bae78ea8430d89
BUILD_BRANCH: HEAD
BUILD_TIME: Dec 18 2024 16:33:13OURCE

查看 ~/.obd/version 文件:

(base) root@ubuntu109:~/.obd# cat ~/.obd/version
3.0.1

不清楚是不是 obd 版本未能升级成功的原因呀,执行 obd upgrade cluster 还是老样子报错。

which obd看下呢,不应该升级不上来的,如果obd update升级后未正确更新版本,wget https://obbusiness-private.oss-cn-shanghai.aliyuncs.com/download-center/opensource/obdeploy/3.1.2/ob-deploy-3.1.2-1.el7.x86_64.rpm
之后obd mirror clone ob-deploy-3.1.2-1.el7.x86_64.rpm
再 obd update
如果这种方式不行的话,建议rpm -e ob-deploy && rpm -ivh ob-deploy-3.1.2-1.el7.x86_64.rpm

方案 1. which obd:

(base) root@ubuntu109:~/.obd# which obd
/usr/bin/obd

方案 2. wget https://obbusiness-private.oss-cn-shanghai.aliyuncs.com/download-center/opensource/obdeploy/3.1.2/ob-deploy-3.1.2-1.el7.x86_64.rpm
obd mirror clone ob-deploy-3.1.2-1.el7.x86_64.rpm
obd update
日志中出现报错:/usr/bin/cp: No such file or directory
部分日志:

[WARNING] Use centos 7 remote mirror repository for ubuntu 18.04
[2025-03-10 18:32:17.907] [DEBUG] - MirrorRepositoryType.REMOTE mirror OceanBase-community-stable-el7 found pkg: ob-deploy-3.1.2-1.el7.x86_64.rpm
[2025-03-10 18:32:17.908] [DEBUG] - md5 is None
[2025-03-10 18:32:17.908] [DEBUG] - name is ob-deploy
[2025-03-10 18:32:17.908] [DEBUG] - arch is ['ia32e', 'x86_64', 'athlon', 'i686', 'i586', 'i486', 'i386', 'noarch']
[2025-03-10 18:32:17.908] [DEBUG] - release is None
[2025-03-10 18:32:17.908] [DEBUG] - version is None
[2025-03-10 18:32:17.908] [DEBUG] - min_version is None
[2025-03-10 18:32:17.908] [DEBUG] - max_version is None
[2025-03-10 18:32:17.908] [DEBUG] - only_download is False
[2025-03-10 18:32:17.908] [DEBUG] - load /root/.obd/mirror/remote/OceanBase-development-kit-el7/.db
[2025-03-10 18:32:17.910] [DEBUG] - MirrorRepositoryType.REMOTE mirror OceanBase-development-kit-el7 found pkg: None
[2025-03-10 18:32:17.910] [DEBUG] - get RPM package by ob-deploy-3.1.2-1.el7.x86_64.rpm
[2025-03-10 18:32:17.917] [DEBUG] - rm /usr/obd/workflows
[2025-03-10 18:32:17.921] [DEBUG] - rm /usr/obd/plugins
[2025-03-10 18:32:17.929] [DEBUG] - rm /usr/obd/config_parser
[2025-03-10 18:32:17.930] [DEBUG] - rm /usr/obd/optimize
[2025-03-10 18:32:17.931] [INFO] Found a higher version package for OBD
[2025-03-10 18:32:17.931] [INFO] name: ob-deploy
[2025-03-10 18:32:17.931] [INFO] version: 3.1.2
[2025-03-10 18:32:17.931] [INFO] release:1.el7
[2025-03-10 18:32:17.931] [INFO] arch: x86_64
[2025-03-10 18:32:17.931] [INFO] md5: 75a9be0ab6bd723754febe30970a27fd52c5cdd7
[2025-03-10 18:32:17.931] [INFO] size: 180450861

此处出现 error:
[2025-03-10 18:32:28.094] [DEBUG] - copy /root/.obd/repository/ob-deploy/3.1.2/75a9be0ab6bd723754febe30970a27fd52c5cdd7/usr/bin/obd /usr/bin/obd
[2025-03-10 18:32:28.095] [DEBUG] - local execute: /usr/bin/cp -f /root/.obd/repository/ob-deploy/3.1.2/75a9be0ab6bd723754febe30970a27fd52c5cdd7/usr/bin/obd /usr/bin/obd 
[2025-03-10 18:32:28.101] [DEBUG] - exited code 127, error output:
[2025-03-10 18:32:28.101] [DEBUG] shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
[2025-03-10 18:32:28.101] [DEBUG] /bin/sh: /usr/bin/cp: No such file or directory

which cp:

(base) root@ubuntu109:~/.obd/cluster# which cp
/bin/cp

手动操作:将 /bin/cp 拷贝至 /usr/bin/cp,再次执行 obd update 问题解决

方案 3. rpm -e ob-deploy && rpm -ivh ob-deploy-3.1.2-1.el7.x86_64.rpm

(base) root@ubuntu109:~/.obd/cluster# rpm -e ob-deploy && rpm -ivh ob-deploy-3.1.2-1.el7.x86_64.rpm
rpm: RPM should not be used directly install RPM packages, use Alien instead!
rpm: However assuming you know what you are doing...
error: package ob-deploy is not installed

我是在 ubuntu 通过 obd_web 安装的 ob.

综上,通过方案2,修复找不到 cp 命令的问题,解决了 obd update 更新失败。

(base) root@ubuntu109:~/.obd/cluster# obd --version
OceanBase Deploy: 3.1.2
REVISION: 739be300c448342b2850c71087bee01091dd7cb7
BUILD_BRANCH: HEAD
BUILD_TIME: Feb 14 2025 14:43:32OURCE

结案

通过修复 obd 无法升级至 3.1.2 版本的问题,再次执行 cluster upgrade 命令,成功升级。

(base) root@ubuntu109:~/.obd/cluster# obd cluster upgrade nrlj -c oceanbase-ce -V 4.3.5.0 --usable=3bf011e10e446a947e15649f193dd75a766fc8a6c28034279d49d478faf54d2e
Get local repositories and plugins ok
Open ssh connection ok
Get deployment connections ok
Get standbys info ok
cluster scenario: olap
Start observer ok
observer program health check ok
Connect to observer 172.*.*.*:2881 ok
Connect to observer 172.*.*.*:2881 ok
obshell start ok
obshell program health check ok
Wait for observer init ok
+-------------------------------------------------+
|                   oceanbase-ce                  |
+---------------+---------+------+-------+--------+
| ip            | version | port | zone  | status |
+---------------+---------+------+-------+--------+
| 172.**.***.** | 4.3.5.0 | 2881 | zone1 | ACTIVE |
+---------------+---------+------+-------+--------+
1 个赞