ocp提醒租户合并异常


看起来这个索引表被删了,你再show index确认一下这个索引表是不是被删了把
确认被删了的话,这个索引表数据不一致已经不care了。就执行alter system clear merge error tenant = xxx,把checksum error状态清理一下吧。

是要确认CDB_OB_TABLE_LOCATIONS吗?
这个怎么确认是否执行删除了?


把checksum error状态清空了,还是同样的错误,执行不了合并

1.简单看就查下__all_table表看下table_id为590292的主表是什么,然后看下这个主表的index表还有没有standardZs索引;另外索引有没有删你们内部没有记录么
2.清空以后哪里报错? 是CDB_OB_MAJOR_COMPACTION表里过一会又重新checksum error吗。清空后查下CDB_OB_MAJOR_COMPACTION我看看呢


我们没手动删除过索引的。

清空执行的操作没出现错误



还是显示11号的在执行中

那先查下CDB_OB_TABLE_LOCATIONS where table_id = 590292看下呢

可能是你在sys租户下查的__all_table所以刚刚没查到


在用户租户下确实查到了
MySQL [oceanbase]> select * from __all_table where table_id = 590292\G
*************************** 1. row ***************************
gmt_create: 2023-07-12 14:30:17.746333
gmt_modified: 2023-07-12 14:50:24.746335
tenant_id: 0
table_id: 590292
table_name: gj
database_id: 500005
table_type: 3
load_type: 0
def_type: 1
rowkey_column_num: 1
index_column_num: 0
max_used_column_id: 46
autoinc_column_id: 0
auto_increment: 1
read_only: 0
rowkey_split_pos: 0
compress_func_name: zstd_1.3.8
expire_condition:
is_use_bloomfilter: 0
comment:
block_size: 16384
collation_type: 45
data_table_id: 0
index_status: 1
tablegroup_id: -1
progressive_merge_num: 0
index_type: 0
part_level: 0
part_func_type: 0
part_func_expr:
part_num: 1
sub_part_func_type: 0
sub_part_func_expr:
sub_part_num: 0
schema_version: 1689144624747096
view_definition:
view_check_option: 0
view_is_updatable: 0
index_using_type: 0
parser_name: NULL
index_attributes_set: 0
tablet_size: 134217728
pctfree: 0
partition_status: 0
partition_schema_version: 0
session_id: 0
pk_comment:
sess_active_time: 0
row_store_type: encoding_row_store
store_format: DYNAMIC
duplicate_scope: 0
progressive_merge_round: 3
storage_format_version: 4
table_mode: 66048
encryption:
tablespace_id: -1
sub_part_template_flags: 0
dop: 1
character_set_client: 0
collation_connection: 0
auto_part_size: 0
auto_part: 0
association_table_id: -1
tablet_id: 287568
max_dependency_version: -1
define_user_id: 200001
transition_point: NULL
b_transition_point: NULL
interval_range: NULL
b_interval_range: NULL
object_status: 1
table_flags: 0
truncate_version: -1
1 row in set (0.014 sec)

都是指向的gj表,这个表2千万多的数据

查一下select * from __all_virtual_tablet_replica_checksum where tablet_id = 1152921504606849455;看看

MySQL [oceanbase]> select * from __all_virtual_tablet_replica_checksum where tablet_id = 1152921504606849455\G
*************************** 1. row ***************************
tenant_id: 1002
tablet_id: 1152921504606849455
svr_ip: 133.197.204.10
svr_port: 2882
ls_id: 1004
gmt_create: 2023-07-12 14:48:52.158903
gmt_modified: 2024-02-11 02:02:37.365907
compaction_scn: 1707588000264588783
row_count: 22262888
data_checksum: 3614153947
column_checksums: magic:636865636B636F6C,compat:0,method:0,bytes:22,colcnt:4,0:26653660477700761,1:47810762693952733,2:42989356252253808,3:53182529754495464
b_column_checksums: 636865636B636F6C0000160499F5F2E5CEABAC2FDD89C1B6AAF4F654F09CA7E4F7D2AE4CE8F39792AAA7BC5E
*************************** 2. row ***************************
tenant_id: 1002
tablet_id: 1152921504606849455
svr_ip: 133.197.204.11
svr_port: 2882
ls_id: 1004
gmt_create: 2023-07-12 14:48:52.164791
gmt_modified: 2024-02-11 02:03:14.570252
compaction_scn: 1707588000264588783
row_count: 22262888
data_checksum: 3614153947
column_checksums: magic:636865636B636F6C,compat:0,method:0,bytes:22,colcnt:4,0:26653660477700761,1:47810762693952733,2:42989356252253808,3:53182529754495464
b_column_checksums: 636865636B636F6C0000160499F5F2E5CEABAC2FDD89C1B6AAF4F654F09CA7E4F7D2AE4CE8F39792AAA7BC5E
*************************** 3. row ***************************
tenant_id: 1002
tablet_id: 1152921504606849455
svr_ip: 133.197.204.12
svr_port: 2882
ls_id: 1004
gmt_create: 2023-07-12 14:48:52.162270
gmt_modified: 2024-02-11 02:01:29.308260
compaction_scn: 1707588000264588783
row_count: 22262888
data_checksum: 3614153947
column_checksums: magic:636865636B636F6C,compat:0,method:0,bytes:22,colcnt:4,0:26653660477700761,1:47810762693952733,2:42989356252253808,3:53182529754495464
b_column_checksums: 636865636B636F6C0000160499F5F2E5CEABAC2FDD89C1B6AAF4F654F09CA7E4F7D2AE4CE8F39792AAA7BC5E
*************************** 4. row ***************************
tenant_id: 1002
tablet_id: 1152921504606849455
svr_ip: 133.197.204.13
svr_port: 2882
ls_id: 1004
gmt_create: 2023-07-12 14:48:53.255191
gmt_modified: 2024-02-11 02:01:01.725211
compaction_scn: 1707588000264588783
row_count: 22262888
data_checksum: 3614153947
column_checksums: magic:636865636B636F6C,compat:0,method:0,bytes:22,colcnt:4,0:26653660477700761,1:47810762693952733,2:42989356252253808,3:53182529754495464
b_column_checksums: 636865636B636F6C0000160499F5F2E5CEABAC2FDD89C1B6AAF4F654F09CA7E4F7D2AE4CE8F39792AAA7BC5E
*************************** 5. row ***************************
tenant_id: 1002
tablet_id: 1152921504606849455
svr_ip: 133.197.204.3
svr_port: 2882
ls_id: 1004
gmt_create: 2023-11-27 14:51:39.365619
gmt_modified: 2024-02-11 02:01:24.319987
compaction_scn: 1707588000264588783
row_count: 22262937
data_checksum: 3539063395
column_checksums: magic:636865636B636F6C,compat:0,method:0,bytes:22,colcnt:4,0:26659666626161974,1:47810862462080152,2:42988599555908571,3:53182646807770761
b_column_checksums: 636865636B636F6C00001604B6DAD4B8B5DAAD2F98A9D48B9EF7F654DBC7AFEFF4BCAE4C89EDC499DEAABC5E
5 rows in set (6.761 sec)

204.3的机器data_checksum和别的机器不一致

目前的恢复手段有两种:
1.把standardZs这个索引表删了,执行alter system clear merge error把checksum error状态给清掉。等合并状态恢复后再重建这个索引
2.把那台checksum不一致的机器下掉,同样执行alter system clear merge error把checksum error状态给清掉。
我建议用方式1,2千多万数据量不是很多。当然也看你们能用哪种了

我们是需要删除gj表上的standardZs索引吗?
然后执行clear merge error,等待自动合并执行?

对,执行完clear merge error后观察一下CDB_OB_MAJOR_COMPACTION

CDB_OB_MAJOR_COMPACTION上2月11号任务执行完成就可以了吗?

是的

好的,谢谢老师,辛苦了

不客气

卡合并的根因分析。obdiag 1.6.1 上支持了,可以跑
obdiag rca run --scene=major_hold

合并的排查文档:OceanBase 社区

遇到类似的问题了