合并十几个小时几乎停止不前

【 使用环境 】测试环境
【 OB or 其他组件 】
【 使用版本 】5.7.25-OceanBase_CE-v4.3.5.3
【问题描述】合并达到99.91%后,十几个小时才推进到99.93%





咨询:问题1、 如何判断合并是否在推进
问题2: select * from __all_virtual_tablet_meta_table where tenant_id = 1002 and compaction_scn < 1765189069459967000 group by svr_ip; 如何根据 tablet_id 查对应表名

2 个赞

用敏捷诊断工具obdiag的卡合并根因分析来分析一下:https://www.oceanbase.com/docs/common-obdiag-cn-1000000004494376

3 个赞

卡住了吧

1 个赞

看tablet_id应该是非系统表。查询CDB_OB_TALBE_LOCATIONS表。
是否使用了列存并确认下异常表的存储方式

1 个赞

111

1 个赞

学习了解下

1 个赞

obdiag_major_hold_20251209110737.tar.gz (130 KB)

1 个赞


查询结果为null

1 个赞

字段指定错了,tablet_id

2 个赞

1 个赞

日志已上传,请查收

1 个赞

看着这个表是分区级合并卡住了 能否根据这些error_trace信息 发一下observer.log的日志信息

1 个赞


我在OCP的日志服务没有搜到,我需要怎么获取你说的日志

1 个赞

你试一下 看看能不能在服务器搜一下日志信息

1 个赞

性能问题不好查啊,跟各位大神学习一下思路

合并操作阻塞的表为 _rm.oms_step ,已上传 Trace 日志,请帮忙判断是不是表损坏以及解决办法



grep “YB420A01E03A-0006456AE1380FCF-0-0” observer.log

[2025-12-09 15:01:27.435060] WDIAG [SQL.DAS] execute_all_task (ob_das_ref.cpp:320) [6845][T1002_L0_G0][T1002][YB420A01E03A-0006456AE1380FCF-0-0] [lt=15][errcode=-4016] fail to execute all agg_tasks(ret=-4016)
[2025-12-09 15:01:27.435065] WDIAG [SQL.DAS] do_table_scan (ob_das_merge_iter.cpp:263) [6845][T1002_L0_G0][T1002][YB420A01E03A-0006456AE1380FCF-0-0] [lt=6][errcode=-4016] failed to execute all das task(ret=-4016)
[2025-12-09 15:01:27.435071] WDIAG [SQL.ENG] do_table_scan (ob_table_scan_op.cpp:2834) [6845][T1002_L0_G0][T1002][YB420A01E03A-0006456AE1380FCF-0-0] [lt=7][errcode=-4016] execute all das scan task failed(ret=-4016)
[2025-12-09 15:01:27.435076] WDIAG [SQL.ENG] do_init_before_get_row (ob_table_scan_op.cpp:1936) [6845][T1002_L0_G0][T1002][YB420A01E03A-0006456AE1380FCF-0-0] [lt=7][errcode=-4016] fail to do table scan(ret=-4016)
[2025-12-09 15:01:27.435080] WDIAG [SQL.ENG] inner_get_next_batch_for_tsc (ob_table_scan_op.cpp:2716) [6845][T1002_L0_G0][T1002][YB420A01E03A-0006456AE1380FCF-0-0] [lt=5][errcode=-4016] failed to init before get row(ret=-4016)
[2025-12-09 15:01:27.435090] WDIAG [SQL.ENG] inner_get_next_batch (ob_table_scan_op.cpp:2679) [6845][T1002_L0_G0][T1002][YB420A01E03A-0006456AE1380FCF-0-0] [lt=12][errcode=-4016] failed to get next batch(ret=-4016)
[2025-12-09 15:01:27.435100] WDIAG [SQL.ENG] get_next_batch (ob_operator.cpp:1477) [6845][T1002_L0_G0][T1002][YB420A01E03A-0006456AE1380FCF-0-0] [lt=13][errcode=-4016] get next batch failed(ret=-4016, eval_ctx={batch_idx:0, batch_size:16, max_batch_size:16, frames_:0x7f1276ccfae0}, id=0, op_name=“PHY_VEC_TABLE_SCAN”)
[2025-12-09 15:01:27.435113] WDIAG [SQL.ENG] get_next_row (ob_operator.cpp:2012) [6845][T1002_L0_G0][T1002][YB420A01E03A-0006456AE1380FCF-0-0] [lt=18][errcode=-4016] get next batch failed(ret=-4016)
[2025-12-09 15:01:27.435123] WDIAG [SQL.EXE] sync_send_result (ob_remote_executor_processor.cpp:273) [6845][T1002_L0_G0][T1002][YB420A01E03A-0006456AE1380FCF-0-0] [lt=13][errcode=-4016] failed to get next row(ret=-4016)
[2025-12-09 15:01:27.435132] WDIAG [SQL.EXE] execute_remote_plan (ob_remote_executor_processor.cpp:472) [6845][T1002_L0_G0][T1002][YB420A01E03A-0006456AE1380FCF-0-0] [lt=12][errcode=-4016] sync send result failed(ret=-4016)
[2025-12-09 15:01:27.435162] INFO [SQL.EXE] end_stmt (ob_sql_trans_control.cpp:1532) [6845][T1002_L0_G0][T1002][YB420A01E03A-0006456AE1380FCF-0-0] [lt=13] end stmt(ret=0, tx_id=0, plain_select=true, stmt_type=1, savepoint=0, tx_desc={this:0x7f12e1764ae0, tx_id:{txid:0}, state:1, addr:“10.1.224.57:2882”, tenant_id:1002, session_id:3222229587, assoc_session_id:3222229587, client_sid:3221767929, xid:NULL, xa_mode:"", xa_start_addr:“0.0.0.0:0”, access_mode:-1, tx_consistency_type:0, isolation:1, snapshot_version:{val:18446744073709551615, v:3}, snapshot_scn:0, active_scn:0, op_sn:1, alloc_ts:1765263687430641, active_ts:-1, commit_ts:-1, finish_ts:-1, timeout_us:-1, lock_timeout_us:-1, expire_ts:9223372036854775807, coord_id:{id:-1}, parts:[], exec_info_reap_ts:0, commit_version:{val:18446744073709551615, v:3}, commit_times:0, commit_cb:null, cluster_id:-1, cluster_version:17180067075, seq_base:1765263687427534, flags_.SHADOW:false, flags_.INTERRUPTED:false, flags_.BLOCK:false, flags_.REPLICA:false, conflict_txs:[], abort_cause:0, commit_expire_ts:0, commit_task_.is_registered():false, modified_tables:[], last_rc_snapshot_version:{val:1765263687291568000, v:0}, ref:1}, trans_result={incomplete:false, parts:[], touched_ls_list:[], conflict_txs:[]}, rollback=true, need_rollback=true, session={this:0x7f13a66a9398, id:3222229587, client_sid:3221767929, deser:true, tenant:"", tenant_id:1002, effective_tenant:"", effective_tenant_id:1002, database:"_rm", user:“root@%”, consistency_level:3, session_state:2, autocommit:true, tx:0x7f12e1764ae0}, exec_ctx.get_errcode()=0)
[2025-12-09 15:01:27.435216] WDIAG [SQL.EXE] execute_with_sql (ob_remote_executor_processor.cpp:730) [6845][T1002_L0_G0][T1002][YB420A01E03A-0006456AE1380FCF-0-0] [lt=66][errcode=-4016] execute remote plan failed(ret=-4016, task={tenant_schema_version:1765188289180008, sys_schema_version:1765176290627408, runner_svr:“10.1.224.57:2882”, ctrl_svr:“10.1.224.58:2882”, task_id:{ob_job_id:{ob_execution_id:{server:“10.1.224.58:2882”, execution_id:6034326, task_type:0, hash:3351960371096099249}, job_id:18446744073709551615}, task_id:0, task_cnt:0}, remote_sql_info:{use_ps:true, is_batched_stmt:false, is_original_ps_mode:false, ps_param_cnt:1, remote_sql:"select * from rm.oms_step limit ?", ps_params:[{obj:{“BIGINT”:1}, accuracy:{length:1, precision:20, scale:0}, flag:1, raw_text_pos:-1, raw_text_len:-1, param_meta:{type:“BIGINT”, collation:“binary”, coercibility:“NUMERIC”}}]}, snapshot:{this:0x7f1276c04208, valid:false, source:0, core:{version:{val:18446744073709551615, v:3}, tx_id:{txid:0}, scn:0}, uncertain_bound:0, snapshot_lsid:{id:-1}, snapshot_ls_role:0, snapshot_acquire_addr:“0.0.0.0:0”, parts:[], committed:false}, ls_list:[{id:1003}], detectable_id:ObDetectableId: (1765175014063971,639363780585400,1002)}, exec_ctx.get_das_ctx().get_snapshot()={this:0x7f1276c09578, valid:true, source:2, core:{version:{val:1765263687291568000, v:0}, tx_id:{txid:0}, scn:0}, uncertain_bound:0, snapshot_lsid:{id:1003}, snapshot_ls_role:1, snapshot_acquire_addr:“10.1.224.57:2882”, parts:[], committed:false})

你把这些查询的error_trace信息 搜一下日志信息 保存文件里 都发一下

搜索不到 ,我把所有节点的所有日志级别全部勾选,用正则匹配

你在查一下 其他表是否能查询

其他表可以查询