【 使用环境 】测试环境
【 使用版本 】4.2
【问题描述】
一、背景描述
在测试进行数据库迁移到OB 4.2 后,部分sql执行缓慢,长时间无执行结果。此前曾做过OceanBase 3 版本的迁移测试,相同的SQL执行效率很快,sql查询结果进行inserte仅20秒左右。
涉及版本:5.7.25-OceanBase_CE-v4.2.0.0
二、排查问题
1.初步分析
初步分析怀疑为迁移后统计信息问题,于是对表进行统计信息收集,但问题仍然存在。
##call dbms_stats.gather_table_stats(‘DEV_TMP’,‘RTE_TEMP_2’,granularity=>‘GLOBAL’,method_opt=>‘FOR ALL COLUMNS SIZE AUTO’);
表结构:
表结构及索引情况.txt (10.2 KB)
慢sql相关信息:
慢SQL.txt (151.3 KB)
OB 3.X版本执行计划:
执行计划.txt (73.6 KB)
2.执行计划检查
后对执行计划进行分析情况如下:
执行计划来看SQL的效率还是不错的,但是实际执行很慢,对问题SQL进行进一步排查:
SQL在实际执行情况下的执行计划为下图所示,可见与预计的执行计划偏差不大,但是为什么耗时很久呢?
OceanBase 3版本执行计划如下:
3.sql_audit检查
根据SQL_AUDIT的wait_class字段发现sql在执行时处于并发等待状态
于是进行并发相关参数进行检查,并尝试调到并发线程,但是问题仍未解决
由于SQL在使用并行hint的情况下长时间等待无法运行,尝试减少去掉hint进行执行
正在上传:image.png(1)…
SQL可以完成执行,但是执行时间仍然很长。
问题SQL Trace日志如下:
[2023-10-11 13:18:39.841540] WDIAG [SQL.ENG] check_status (ob_px_data_ch_provider.cpp:424) [11494][T1002_PX_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=23][errcode=-4662] received a interrupt(code={code:-4662, info:{msg:"tid:112273,from:"10.160.34.36:2882",PX ABORT DFO"}}, ret=-4662)
[2023-10-11 13:18:39.841570] WDIAG [SQL.ENG] get_transmit_data_ch (ob_px_sqc_proxy.cpp:267) [11494][T1002_PX_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=34][errcode=-4662] fail peek data channel from ch_provider(ret=-4662)
[2023-10-11 13:18:39.841578] WDIAG [SQL.ENG] get_data_ch (ob_px_transmit_op.cpp:99) [11494][T1002_PX_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=8][errcode=-4662] fail get data ch sets from provider(ret=-4662)
[2023-10-11 13:18:39.841584] WDIAG [SQL.ENG] init_channel (ob_px_transmit_op.cpp:317) [11494][T1002_PX_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=6][errcode=-4662] Fail to get data dtl channel(ret=-4662)
[2023-10-11 13:18:39.841591] WDIAG [SQL.ENG] inner_open (ob_px_transmit_op.cpp:201) [11494][T1002_PX_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=6][errcode=-4662] fail to exec init_channel(*trans_input)(ret=-4662)
[2023-10-11 13:18:39.841596] WDIAG [SQL.ENG] inner_open (ob_px_reduce_transmit_op.cpp:37) [11494][T1002_PX_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=5][errcode=-4662] initialize operator context failed(ret=-4662)
[2023-10-11 13:18:39.841602] WDIAG [SQL.ENG] open (ob_operator.cpp:714) [11494][T1002_PX_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=5][errcode=-4662] Open this operator failed(ret=-4662, op_type="PHY_PX_REDUCE_TRANSMIT")
[2023-10-11 13:18:39.841612] WDIAG [SQL.ENG] execute (ob_px_task_process.cpp:288) [11494][T1002_PX_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=8][errcode=-4662] fail open dfo op(ret=-4662)
[2023-10-11 13:18:39.841632] WDIAG [SQL.ENG] do_process (ob_px_task_process.cpp:451) [11494][T1002_PX_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=6][errcode=-4662] failed to execute plan(ret=-4662)
[2023-10-11 13:18:39.841641] WDIAG [SQL.ENG] process (ob_px_task_process.cpp:178) [11494][T1002_PX_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=6][errcode=-4662] failed to process(get_tenant_id()=1002, ret=-4662)
[2023-10-11 13:18:39.842265] INFO [SQL.ENG] all_task_success (ob_px_sqc_handler.cpp:304) [11494][T1002_PX_G0][T1002][Y0-0000000000000000-0-0] [lt=7] px task not succ(task.get_result()=-4662, i=2)
[2023-10-11 13:18:39.842481] WDIAG update_qc_error_code (ob_px_util.h:589) [112273][T1002_L0_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=5][errcode=-4662]
[2023-10-11 13:18:39.842496] WDIAG update_qc_error_code (ob_px_util.h:589) [112273][T1002_L0_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=10][errcode=-4662]
[2023-10-11 13:18:39.845149] INFO [STORAGE.BLKMGR] alloc_block (ob_block_manager.cpp:329) [112785][T1002_PX_G0][T1002][YB420AA02225-00060757FA21A99C-0-0] [lt=16] successfully alloc block(macro_id=[1747507](ver=0,mode=0,seq=13793574))
[2023-10-11 13:18:39.845477] INFO [STORAGE.BLKMGR] async_write (ob_macro_block_handle.cpp:178) [112785][T1002_PX_G0][T1002][YB420AA02225-00060757FA21A99C-0-0] [lt=24] Async write macro block(macro_id_=[1747507](ver=0,mode=0,seq=13793574))
[2023-10-11 13:18:39.845614] INFO [STORAGE.BLKMGR] alloc_block (ob_block_manager.cpp:329) [2501][T1002_PX_G0][T1002][YB420AA02225-00060757BF51AA7F-0-0] [lt=11] successfully alloc block(macro_id=[1747508](ver=0,mode=0,seq=13793575))
[2023-10-11 13:18:39.846593] INFO [SQL.ENG] broadcast_dfo (ob_px_interruption.cpp:74) [112273][T1002_L0_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=4] success to send interrupt message(int_code={code:-4662, info:{msg:"tid:112273,from:"10.160.34.36:2882",PX ABORT DFO"}}, addr="10.160.34.37:2882", interrupt_id=task(1696922810486189,4303357292))
[2023-10-11 13:18:39.846619] INFO [SQL.ENG] broadcast_dfo (ob_px_interruption.cpp:74) [112273][T1002_L0_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=18] success to send interrupt message(int_code={code:-4662, info:{msg:"tid:112273,from:"10.160.34.36:2882",PX ABORT DFO"}}, addr="10.160.34.36:2882", interrupt_id=task(1696922810486189,4303357292))
[2023-10-11 13:18:39.846643] INFO [SQL.ENG] broadcast_dfo (ob_px_interruption.cpp:74) [112273][T1002_L0_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=6] success to send interrupt message(int_code={code:-4662, info:{msg:"tid:112273,from:"10.160.34.36:2882",PX ABORT DFO"}}, addr="10.160.34.37:2882", interrupt_id=task(1696922810486189,4303357292))
[2023-10-11 13:18:39.846653] INFO [SQL.ENG] broadcast_dfo (ob_px_interruption.cpp:74) [112273][T1002_L0_G0][T1002][YB420AA02224-0006075830CFA032-0-0] [lt=6] success to send interrupt message(int_code={code:-4662, info:{msg:"tid:112273,from:"10.160.34.36:2882",PX ABORT DFO"}}, addr="10.160.34.36:2882", interrupt_id=task(1696922810486189,4303357292))
【附件】