OBServer合并失败

【 使用环境 】生产环境
【 OB 】
【 使用版本 】4.3.4.0
【问题描述】OBServer 合并失败 failed to merge partition errcode=-4016
【复现路径】

【根因分析】

租户 1007 tablet_id 519您对他做过什么动作没?

1 个赞

没做过什么 这个库是ocp的元数据库 一般不会动它

1 个赞

您连接进去 , 看看这个 tablet_id 519是个什么? 里面的是什么数据?表结构都分析下

1 个赞

麻烦提供一份observer日志
使用DBA_OB_TABLE_LOCATIONS查询一下表名等详细信息

1 个赞

observer.zip (10.0 MB)

是系统表 __all_scheduler_job_run_detail_v2

1 个赞

show create table __all_scheduler_job_run_detail_v2 \G --看下

1007租户是那个租户 ?? 租户名发下

1 个赞

META$1008

1 个赞

租户名

表结构发下 啊

[2025-03-07 14:48:25.675361] WDIAG [STORAGE] merge_partition (ob_partition_merger.cpp:568) [3675279][T1007_MAJOR_MER][T1007][Y3252C0A8110B-00062F8042122AC8-0-0] [lt=170][errcode=-4016] Failed to prepare merge partition(ret=-4016, ctx={ObBasicTabletMergeCtx:{static_param:{dag_param:{skip_get_tablet:false, merge_type:“MEDIUM_MERGE”, merge_version:1733940003644573000, ls_id:{id:1}, tablet_id:{id:519}, exec_mode:“EXEC_MODE_LOCAL”, need_swap_tablet_flag:false, is_reserve_mode:false, schedule_transfer_seq:0}, scn_range:{start_scn:{val:0, v:0}, end_scn:{val:0, v:0}}, version_range:{multi_version_start:1733940003644573000, base_version:0, snapshot_version:1733940003644573000}, is_full_merge:false, concurrent_cnt:1, merge_level:1, major_sstable_status:0, merge_reason:“TENANT_MAJOR”, co_major_merge_type:“INVALID_CO_MAJOR_MERGE_TYPE”, sstable_logic_seq:0, tables_handle:{tablet_id:{id:519}, table_count:9, [{i:0, table_key:{tablet_id:{id:519}, column_group_idx:0, table_type:“MAJOR”, scn_range:{start_scn:{val:0, v:0}, end_scn:{val:1733853601748767000, v:0}}}, ref:0}{i:1, table_key:{tablet_id:{id:519}, column_group_idx:0, table_type:“MINI”, scn_range:{start_scn:{val:1733853601769024000, v:0}, end_scn:{val:1733865751637927000, v:0}}}, ref:0}{i:2, table_key:{tablet_id:{id:519}, column_group_idx:0, table_type:“MINI”, scn_range:{start_scn:{val:1733865751637927000, v:0}, end_scn:{val:1733882321721823001, v:0}}}, ref:0}{i:3, table_key:{tablet_id:{id:519}, column_group_idx:0, table_type:“MINI”, scn_range:{start_scn:{val:1733882321721823001, v:0}, end_scn:{val:1733892066814610000, v:0}}}, ref:0}{i:4, table_key:{tablet_id:{id:519}, column_group_idx:0, table_type:“MINI”, scn_range:{start_scn:{val:1733892066814610000, v:0}, end_scn:{val:1733908699585006000, v:0}}}, ref:0}{i:5, table_key:{tablet_id:{id:519}, column_group_idx:0, table_type:“MINI”, scn_range:{start_scn:{val:1733908699585006000, v:0}, end_scn:{val:1733925601673170000, v:0}}}, ref:0}{i:6, table_key:{tablet_id:{id:519}, column_group_idx:0, table_type:“MINI”, scn_range:{start_scn:{val:1733925601673170000, v:0}, end_scn:{val:1733940003648856000, v:0}}}, ref:0}{i:7, table_key:{tablet_id:{id:519}, column_group_idx:0, table_type:“MINOR”, scn_range:{start_scn:{val:1733940003648856000, v:0}, end_scn:{val:1741253804384707000, v:0}}}, ref:0}{i:8, table_key:{tablet_id:{id:519}, column_group_idx:0, table_type:“MINI”, scn_range:{start_scn:{val:1741253804384707000, v:0}, end_scn:{val:1741298830316472000, v:0}}}, ref:0}]}, is_rebuild_column_store:false, is_schema_changed:false, is_tenant_major_merge:true, is_cs_replica:false, read_base_version:1733853601748767000, merge_scn:{val:0, v:0}, need_parallel_minor_merge:true, schema:0x40069220e060, multi_version_column_descs_cnt:31, ls_handle:{ls_map_:0x40018b1fc050, ls_:0x40019c404250, mod_:21}, snapshot_info:{type:“MAJOR_FREEZE_TS”, snapshot:1733940003644573000}, is_backfill:false, tablet_schema_guard:{is_inited:false, schema:null}, tablet_transfer_seq:0, co_base_snapshot_version:0}, static_desc:{ls_id:{id:1}, tablet_id:{id:519}, tablet_transfer_seq:0, merge_type:“MEDIUM_MERGE”, snapshot_version:1733940003644573000, end_scn:{val:1733940003644573000, v:0}, exec_mode:“EXEC_MODE_LOCAL”, is_ddl:false, compressor_type:1, macro_block_size:2097152, macro_store_size:1887436, micro_block_size_limit:2095976, schema_version:1733723390148096, encrypt_id:0, master_key_id:0, encrypt_key:“data_size:16, data:00000000000000000000000000000000”, major_working_cluster_version:17180066816, micro_index_clustered:false, progressive_merge_round:1, need_submit_io:true, encoding_granularity:65536}, parallel_merge_ctx:{concurrent_cnt:1, }, tablet_handle:{obj:0x400428b120a0, obj_pool:0x40018b1e2cf0, allocator:null, wash_priority:0, allow_copy_and_assign:true}, info_collector:{merge_progress:0x400692210060, time_guard:add_time:1741330105652178|total=434us, error_location:ob_partition_merger.cpp:133(prepare_merge)}, merge_dag:0x400278f84030}, merge_info:{is_inited:true, merge_history:{static_info:{ls_id:{id:1}, tablet_id:{id:519}, merge_type:“MAJOR_MERGE”, compaction_scn:1733940003644573000, is_full_merge:false, concurrent_cnt:1, merge_level:“MICRO_BLOCK_LEVEL”, exec_mode:“EXEC_MODE_LOCAL”, merge_reason:“TENANT_MAJOR”, base_major_status:“invalid_co_major_status”, co_major_merge_type:“INVALID_CO_MAJOR_MERGE_TYPE”, kept_snapshot_info:{type:“invalid_snapshot_type”, snapshot:0}, participant_table_info:{is_major_merge:false, table_cnt:0, snapshot_version:0, start_scn:0, end_scn:0}, progressive_merge_round:1, progressive_merge_num:0, is_fake:false}, running_info:{merge_start_time:1741330105662147, merge_finish_time:0, execute_time:0, dag_id:Y0-0000000000000000-0-0, start_cg_idx:0, end_cg_idx:0, io_percentage:0, parallel_merge_info:{{type:“scan_units”, info:{min:9223372036854775807, max:-9223372036854775808, sum:0, count:0}}, {type:“cost_time”, info:{min:9223372036854775807, max:-9223372036854775808, sum:0, count:0}}, {type:“use_old_macro_block_cnt”, info:{min:9223372036854775807, max:-9223372036854775808, sum:0, count:0}}, {type:“incremental_row_count”, info:{min:9223372036854775807, max:-9223372036854775808, sum:0, count:0}}, }}, block_info:{occupy_size:0, original_size:0, macro_block_count:0, multiplexed_macro_block_count:0, new_micro_count_in_new_macro:0, multiplexed_micro_count_in_new_macro:0, total_row_count:0, incremental_row_count:0, new_micro_info:{meta_micro_size:0, data_micro_size:0}, block_io_us:0}, diagnose_info:{dag_ret:0, retry_cnt:0, suspect_add_time:0, early_create_time:0, error_location:(null):-1((null))}}, sstable_builder:{data_store_desc:{desc:{static_desc:{ls_id:{id:1}, tablet_id:{id:519}, tablet_transfer_seq:0, merge_type:“MEDIUM_MERGE”, snapshot_version:1733940003644573000, end_scn:{val:1733940003644573000, v:0}, exec_mode:“EXEC_MODE_LOCAL”, is_ddl:false, compressor_type:1, macro_block_size:2097152, macro_store_size:1887436, micro_block_size_limit:2095976, schema_version:1733723390148096, encrypt_id:0, master_key_id:0, encrypt_key:“data_size:16, data:00000000000000000000000000000000”, major_working_cluster_version:17180066816, micro_index_clustered:false, progressive_merge_round:1, need_submit_io:true, encoding_granularity:65536}, row_store_type:“encoding_row_store”, col_desc:{is_row_store:true, table_cg_idx:0, row_column_count:31, rowkey_column_count:4, schema_rowkey_col_cnt:2, full_stored_col_cnt:31, col_desc_array:[cnt:31, column_id=18 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=19 {type:“TIMESTAMP”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=7 {type:“BIGINT”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=8 {type:“BIGINT”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=16 {type:“TIMESTAMP”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=17 {type:“TIMESTAMP”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=20 {type:“BIGINT”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=21 {type:“BIGINT”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=22 {type:“TIMESTAMP”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=23 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=24 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=25 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=26 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=27 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=28 {type:“BIGINT”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=29 {type:“TIMESTAMP”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=30 {type:“TIMESTAMP”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=31 {type:“BIGINT”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=32 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=33 {type:“BIGINT UNSIGNED”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=34 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=35 {type:“BIGINT”, collation:“binary”, coercibility:“NUMERIC”} order=0, column_id=36 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=37 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=38 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=39 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=40 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=41 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=42 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=43 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0, column_id=44 {type:“VARCHAR”, collation:“utf8mb4_general_ci”, coercibility:“INVALID”} order=0], default_col_checksum_array_valid:true, col_default_checksum_array:[cnt:31, 2197175160, 2197175160, 0, 0, 2197175160, 2197175160, 2388842353, 2388842353, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160, 2388842353, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160, 2197175160], agg_meta_array:[cnt:0]}, encoder_opt:{enable_bit_packing:true, store_sorted_var_len_numbers_dict:false, enable_raw:true, enable_dict:true, enable_int_diff:true, enable_str_diff:true, enable_hex_pack:true, enable_rle:true, enable_const:true}, sstable_index_builder:null, need_pre_warm:false, need_build_hash_index_for_micro_block:false, data_store_type:1, micro_block_size:16384}}, index_builder:{roots_.count():1}, index_read_info:0x400428b124d0, rebuilder_ptr:NULL}}}, idx=0)

[2025-03-07 14:48:25.675504] WDIAG [STORAGE] process (ob_tablet_merge_task.cpp:1183) [3675279][T1007_MAJOR_MER][T1007][Y3252C0A8110B-00062F8042122AC8-0-0] [lt=142][errcode=-4016] failed to merge partition(ret=-4016)

OCP集群应该就三个租户,你的1008租户是什么,当前集群的架构是什么样子,执行 select @@version发一下当前详细版本

麻烦把上面的obdiag收集的信息打包发一下
麻烦到RS节点查询rootserver日志
grep T1007_MergeSche发一下信息
查询下__all_virtual_compaction_diagnose_info信息,指定tenant_id=1007 看一下tablet id=519的相关报错。
查询下__all_virtual_dag_warning_history信息,指定tenant_id=1007 看一下tablet id=519的相关行提供下trace id。
拿到trace ID,到对应时间范围内的observer log中grep出来发出来这边分析一下。

1 个赞

这个时间点observer.log 麻烦发下

1 个赞

你好当前问题状态如何了

你好 之前那个系统已经被铲掉了 重新部署了新版本的库