V433升级至V4331失败,集群进入upgrading 状态,OBD无法执行其他指令

【 使用环境 】测试环境
【 OB or 其他组件 】OceanBase 社区版
【 使用版本 】V4.3.3
【问题描述】升级至V4.3.3.1,检测到有合并退出升级,集群进入upgrading状态

升级失败报错

[root@obproxy ~]# obd cluster upgrade obtest -c oceanbase-ce -V 4.3.3.1 --skip-check
Get local repositories and plugins ok
Open ssh connection ok
Get deployment connections ok
Get standbys info ok
cluster scenario: htap
Start observer ok
observer program health check ok
obshell program health check ok
Connect to observer x.x.x.x:2881 ok
Exec upgrade_checker.py x
_main__.MyError: 'upgrade checker failed with 2 reasons: [1 tenant is merging, please check] , [3 tablet is merging, please check] '

集群进入upgrading状态,无法执行其他命令

数据库目前使用正常

把obd.log的日志发一下 还有obd的yaml配置文件

发起一次转储后,重新执行升级
ALTER SYSTEM MINOR FREEZE TENANT = all_user;

依旧:

_main__.MyError: 'upgrade checker failed with 2 reasons: [1 tenant is merging, please check] , [3 tablet is merging, please check] '

obd版本是多少,ob的升级脚本里面对 python脚本做了改造。obd在2.10.0及以后的版本做了适配。升级到最新的2.101再进行升级,如果是离线升级在obd升级后禁掉远程仓库再进行

1、升级前obd update过,只能停留在2.10.0

2、remote disable的

执行这俩sql看看当前合并状态
select * from GV$OB_TABLET_COMPACTION_PROGRESS;
select * from GV$OB_COMPACTION_DIAGNOSE_INFO;

1、在root@sys上执行报错未指定库

1046 - No database selected

2、在租户上指定oceanbase库执行查询未空

表是存在oceanbase库下的,查询截图发出来看一下

存在合并报错导致的无法升级集群,之前没发现么。11号就报错了

压根看不出哪里有这个提示。
升级check也只是报错有没合并的。
这种情况如何处理。

根据error_trace的id在日志中grep一下看看现在还能查出来

[2024-10-30 15:45:15.786727] WDIAG [STORAGE] process (ob_tablet_merge_task.cpp:1154) [17930][T1001_MAJOR_MER][T1001][YB420A010134-000624170BB7D77A-0-0] [lt=9][errcode=-4016] failed to merge partition(ret=-4016)
[2024-10-30 15:45:15.786938] INFO  [STORAGE.COMPACTION] reset (ob_partition_rows_merger.cpp:898) [17930][T1001_MAJOR_MER][T1001][YB420A010134-000624170BB7D77A-0-0] [lt=18] partition merge iter row count(i=0, row_count=0, ghost_row_count=0, table_key={tablet_id:{id:455}, column_group_idx:0, table_type:"MINOR", scn_range:{start_scn:{val:1728583200251694000, v:0}, end_scn:{val:1730274171874567000, v:0}}})
[2024-10-30 15:45:15.787023] INFO  [STORAGE.COMPACTION] reset (ob_partition_rows_merger.cpp:898) [17930][T1001_MAJOR_MER][T1001][YB420A010134-000624170BB7D77A-0-0] [lt=46] partition merge iter row count(i=1, row_count=1, ghost_row_count=0, table_key={tablet_id:{id:455}, column_group_idx:0, table_type:"MINI", scn_range:{start_scn:{val:1728572405944791000, v:0}, end_scn:{val:1728583200251694000, v:0}}})
[2024-10-30 15:45:15.787058] INFO  [STORAGE.COMPACTION] reset (ob_partition_rows_merger.cpp:898) [17930][T1001_MAJOR_MER][T1001][YB420A010134-000624170BB7D77A-0-0] [lt=22] partition merge iter row count(i=2, row_count=3142, ghost_row_count=0, table_key={tablet_id:{id:455}, column_group_idx:0, table_type:"MINOR", scn_range:{start_scn:{val:1728496802179430000, v:0}, end_scn:{val:1728572405944791000, v:0}}})
[2024-10-30 15:45:15.787088] INFO  [STORAGE.COMPACTION] reset (ob_partition_rows_merger.cpp:898) [17930][T1001_MAJOR_MER][T1001][YB420A010134-000624170BB7D77A-0-0] [lt=20] partition merge iter row count(i=3, row_count=3259, ghost_row_count=0, table_key={tablet_id:{id:455}, column_group_idx:0, table_type:"MAJOR", scn_range:{start_scn:{val:0, v:0}, end_scn:{val:1728496802164332000, v:0}}})
[2024-10-30 15:45:15.787162] WDIAG [STORAGE] process (ob_tablet_merge_task.cpp:1172) [17930][T1001_MAJOR_MER][T1001][YB420A010134-000624170BB7D77A-0-0] [lt=28][errcode=-4016] failed to merge(ret=-4016, param={skip_get_tablet:false, merge_type:"MEDIUM_MERGE", merge_version:1728583200230015000, ls_id:{id:1}, tablet_id:{id:455}, exec_mode:"EXEC_MODE_LOCAL", need_swap_tablet_flag:false, is_reserve_mode:false, transfer_seq:0}, idx_=0)

你帮忙根据tablet id查一下表名基本信息 CDB_OB_TABLE_LOCATIONS

帮忙在rootserver.log中也grep一下报错日志

找不到

黑屏化查看
grep “replica not merged to target version or status not match” rootservice.log

每个节点都搜过了,没有

cat rootservice.log | grep "replica not merged to target version or status not match"