ERROR 4025 (HY000): Partial failed

【 使用环境 】测试环境
【 OB or 其他组件 】OB
【 使用版本 】4.3.0
【问题描述】ERROR 4025 (HY000): Partial failed

遇到一个很神奇的现象,对某个zone 转储,或者某个server 租户转储都没问题,但是一旦tenant=all_user 或者指定租户转储就不行了


obclient [oceanbase]> select TENANT_ID,TENANT_NAME from dba_oB_tenants;
+-----------+-------------+
| TENANT_ID | TENANT_NAME |
+-----------+-------------+
|         1 | sys         |
|      1001 | META$1002   |
|      1002 | mm          |
|      1003 | META$1004   |
|      1004 | heioy       |
+-----------+-------------+
5 rows in set (0.011 sec)

obclient [oceanbase]> ALTER SYSTEM MINOR FREEZE TENANT = mm;
ERROR 4025 (HY000): Partial failed
obclient [oceanbase]> ALTER SYSTEM MINOR FREEZE TENANT = 'mm';
ERROR 4025 (HY000): Partial failed
obclient [oceanbase]> ALTER SYSTEM MINOR FREEZE TENANT = 'heioy';
ERROR 4025 (HY000): Partial failed

obclient [oceanbase]> alter system minor freeze;
Query OK, 0 rows affected (0.008 sec)

obclient [oceanbase]> alter system minor freeze tenant=all_user;
ERROR 4025 (HY000): Partial failed
obclient [oceanbase]> alter system minor freeze tenant='all_user';
ERROR 4025 (HY000): Partial failed
obclient [oceanbase]> alter system minor freeze tenant='all_user';
ERROR 4025 (HY000): Partial failed
obclient [oceanbase]> ALTER SYSTEM MINOR FREEZE TENANT = all_meta;
ERROR 4025 (HY000): Partial failed


obclient [oceanbase]> alter system minor freeze ;
Query OK, 0 rows affected (0.007 sec)
obclient [oceanbase]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.16.130.11:2882');
Query OK, 0 rows affected (0.014 sec)

obclient [oceanbase]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.16.130.11:3882');
Query OK, 0 rows affected (0.040 sec)

obclient [oceanbase]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.16.130.11:4882');
Query OK, 0 rows affected (0.011 sec)

obclient [oceanbase]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.16.130.81:2882');
Query OK, 0 rows affected (0.019 sec)

obclient [oceanbase]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.16.130.81:3882');
Query OK, 0 rows affected (0.150 sec)

obclient [oceanbase]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.16.130.81:4882');
Query OK, 0 rows affected (0.050 sec)

obclient [oceanbase]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.16.130.93:2882');
Query OK, 0 rows affected (0.018 sec)

obclient [oceanbase]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.16.130.93:3882');
Query OK, 0 rows affected (0.001 sec)

obclient [oceanbase]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.16.130.93:4882');
Query OK, 0 rows affected (0.001 sec)

大佬们有啥排查的方向和建议不?我看日志没看到有啥具体的报错信息

麻烦拿个报错时的 trace日志出来

1.开启 Trace 功能
SET ob_enable_show_trace=ON;
2.执行SQL
3.获取SQL trace_id
SELECT last_trace_id() FROM DUAL;
4.登录对应 OBServer 节点,进入到日志文件所在目录
cd /home/admin/oceanbase/log
5.获取trace_id对应的日志
grep xxxxxxx observer.log --填写第3步获取的trace_id

2 个赞
[root@qobcom93 ~]# grep YB42AC10825D-000621C10A95DAE6-0-0 /home/admin/observer2/log/observer.log
[2024-09-11 11:22:01.594586] INFO  [SERVER] minor_freeze (ob_service.cpp:929) [35401][T1_L0_G0][T1][YB42AC10825D-000621C10A95DAE6-0-0] [lt=8] receive minor freeze request(arg={tenant_ids:[1002], ls_id:{id:-1}, tablet_id:{id:0}})
[2024-09-11 11:22:01.594616] WDIAG [SERVER.OMT] get_tenant_base_with_lock (ob_multi_tenant.cpp:225) [35401][T1_L0_G0][T1][YB42AC10825D-000621C10A95DAE6-0-0] [lt=27][errcode=-5150] get tenant from omt failed(ret=-5150, tenant_id=1002)
[2024-09-11 11:22:01.594632] WDIAG [SHARE] switch_to (ob_tenant_base.cpp:550) [35401][T1_L0_G0][T1][YB42AC10825D-000621C10A95DAE6-0-0] [lt=8][errcode=-5150] switch tenant fail(tenant_id=1002, ret=-5150, lbt()="0x654603c 0x13dfef4a 0x6171c9e 0xc1eb05b 0xc1ea2a9 0xbc9ee95 0x65997a5 0x5f41253 0x5f31048 0xb717dc4 0x17ea716e 0x7fd91be83f1b 0x7fd91bdb92e0")
[2024-09-11 11:22:01.594643] WDIAG [SERVER] tenant_freeze_ (ob_service.cpp:1074) [35401][T1_L0_G0][T1][YB42AC10825D-000621C10A95DAE6-0-0] [lt=10][errcode=-5150] fail to switch tenant(ret=-5150, tenant_id=1002)
[2024-09-11 11:22:01.594649] WDIAG [SERVER] handle_tenant_freeze_req_ (ob_service.cpp:987) [35401][T1_L0_G0][T1][YB42AC10825D-000621C10A95DAE6-0-0] [lt=6][errcode=0] fail to freeze tenant(tmp_ret=-5150, tenant_id=1002)
[2024-09-11 11:22:01.594657] INFO  [SERVER] minor_freeze (ob_service.cpp:947) [35401][T1_L0_G0][T1][YB42AC10825D-000621C10A95DAE6-0-0] [lt=8] finish minor freeze request(ret=-5150, arg={tenant_ids:[1002], ls_id:{id:-1}, tablet_id:{id:0}}, cost_ts=73)

obclient [oceanbase]> select TENANT_ID,TENANT_NAME,PRIMARY_ZONE,LOCALITY,STATUS from dba_ob_tenants;
+-----------+-------------+-------------------+---------------------------------------------+--------+
| TENANT_ID | TENANT_NAME | PRIMARY_ZONE      | LOCALITY                                    | STATUS |
+-----------+-------------+-------------------+---------------------------------------------+--------+
|         1 | sys         | RANDOM            | FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3 | NORMAL |
|      1001 | META$1002   | zone1,zone2,zone3 | FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3 | NORMAL |
|      1002 | mm          | zone1,zone2,zone3 | FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3 | NORMAL |
+-----------+-------------+-------------------+---------------------------------------------+--------+
3 rows in set (0.011 sec)

看着好像数据盘有问题似的,大佬帮忙看下是不是?感谢

我们分析下,有进展尽快回复你

sql.zip (1.7 KB)

麻烦执行下附件SQL 查下租户资源配置

–查下租户分布情况
select a.zone,a.svr_ip,b.tenant_name,b.tenant_type
from oceanbase.gv$ob_units a join oceanbase.dba_ob_tenants b on a.tenant_id=b.tenant_id order by b.tenant_name;

@意大利面拌42号混凝土 这个麻烦执行下

不好意思,昨天忘记看消息了,昨天查下来确实是盘上有问题,修复后重建就好了,感谢大佬

盘上什么问题呢?可以分享下吗

我们数据盘做了一个自研的双副本文件系统,测试拔双副本的盘后,有异常进程宕掉起不来,然后更新了文件系统补丁后,集群恢复起来,之前拔盘的节点有异常,某些租户的数据有异常,所以部分数据接受转储操作失败了,后面用最新的文件系统,重建集群后就没问题了

节点异常导致的这个现象,感谢分享