OB升级至4.3.5 bp2之后,部分SQL出现明显的性能问题

【 使用环境 】测试环境
【 OB or 其他组件 】oceanbase 4.3.5 bp2
【 使用版本 】oceanbase 4.3.5 bp2
【问题描述】
老师,您们好,
我们将Oceanbase升级从4.3.5 ce bp1升级至4.3.5 ce bp2后,不少SQL出现了明显的性能问题,主要原因为不走原索引,原先我们也认为是没进行统计信息更新,但是更新后仍然如此。
例子:
SQL语句:

SELECT bbsj.cell_value_ FROM 
ftsp_bb_bbxx bb 
join ftsp_bb_bbsj_210000 bbsj on bbsj.bb_bbxx_id_ = bb.id_
WHERE bb.zt_ztxx_id_ = 'h0000000000000772032784086532096'
and bb.kj_qj_ = '202406'
and bb_code_ = 'xjllb'
and qyzt_ = '1'
AND case 
	when 'xqykjzz'='qykjzz' 	then bbsj.bb_mb_cell_id_ = '39c08a600bbe2ba5e15b1a9d1c82b1ac'
	when 'xqykjzz'='qykjzzyzx' then bbsj.bb_mb_cell_id_ = 'abc08a600bbe2ba5e15b1a9d1c82b1ac'
	when 'xqykjzz'='xqykjzz' 	then bbsj.bb_mb_cell_id_ = '98d67e9ba5a801dbbc50537bd7184ffb'
	when 'xqykjzz'='qykjzd'		then bbsj.bb_mb_cell_id_ = '25e9f9430fa1bec83a31241a361ccac3'
	else 0
end
order by bbsj.create_date_ desc limit 1;

4.3.5 bp2 执行计划,执行时间>40s

=======================================================================
|ID|OPERATOR            |NAME                   |EST.ROWS|EST.TIME(us)|
-----------------------------------------------------------------------
|0 |TOP-N SORT          |                       |1       |4399555     |
|1 |└─HASH JOIN         |                       |1       |4399555     |
|2 |  ├─TABLE RANGE SCAN|bb(idx_ftsp_bb_bbxx_n1)|1       |9           |
|3 |  └─TABLE FULL SCAN |bbsj                   |4842    |4398272     |
=======================================================================
Outputs & filters:
-------------------------------------
  0 - output([bbsj.cell_value_]), filter(nil), rowset=256
      sort_keys([bbsj.create_date_, DESC]), topn(1)
  1 - output([bbsj.create_date_], [bbsj.cell_value_]), filter(nil), rowset=256
      equal_conds([bbsj.bb_bbxx_id_ = bb.id_]), other_conds(nil)
  2 - output([bb.id_]), filter([bb.qyzt_ = demote_cast('1', TINYINT(4, 0))]), rowset=256
      access([bb.id_], [bb.qyzt_]), partitions(p0)
      is_index_back=true, is_global_index=false, filter_before_indexback[false], 
      range_key([bb.zt_ztxx_id_], [bb.kj_qj_], [bb.bb_code_], [bb.id_]), range(h0000000000000772032784086532096,202406,xjllb,MIN ; h0000000000000772032784086532096,
      202406,xjllb,MAX), 
      range_cond([bb.zt_ztxx_id_ = 'h0000000000000772032784086532096'], [bb.kj_qj_ = '202406'], [bb.bb_code_ = 'xjllb'])
  3 - output([bbsj.bb_bbxx_id_], [bbsj.cell_value_], [bbsj.create_date_]), filter([CASE WHEN 'xqykjzz' = 'qykjzz' THEN bbsj.bb_mb_cell_id_ = '39c08a600bbe2ba5e15b1a9d1c82b1ac' 
      WHEN 'xqykjzz' = 'qykjzzyzx' THEN bbsj.bb_mb_cell_id_ = 'abc08a600bbe2ba5e15b1a9d1c82b1ac' WHEN 'xqykjzz' = 'xqykjzz' THEN bbsj.bb_mb_cell_id_ = '98d67e9ba5a801dbbc50537bd7184ffb' 
      WHEN 'xqykjzz' = 'qykjzd' THEN bbsj.bb_mb_cell_id_ = '25e9f9430fa1bec83a31241a361ccac3' ELSE 0 END], [TOPN_FILTER(bbsj.create_date_)]), rowset=256
      access([bbsj.bb_bbxx_id_], [bbsj.bb_mb_cell_id_], [bbsj.cell_value_], [bbsj.create_date_]), partitions(p0)
      is_index_back=false, is_global_index=false, filter_before_indexback[false,false], 
      range_key([bbsj.id_]), range(MIN ; MAX)always true

4.3.5 bp1 执行计划,执行时间在毫秒级

==============================================================================
|ID|OPERATOR            |NAME                          |EST.ROWS|EST.TIME(us)|
------------------------------------------------------------------------------
|0 |TOP-N SORT          |                              |1       |87          |
|1 |└─NESTED-LOOP JOIN  |                              |1       |87          |
|2 |  ├─TABLE RANGE SCAN|bb(idx_ftsp_bb_bbxx_n1)       |1       |9           |
|3 |  └─TABLE RANGE SCAN|bbsj(uk_ftsp_bb_bbsj_210000_1)|13      |77          |
==============================================================================
Outputs & filters:
-------------------------------------
  0 - output([bbsj.cell_value_]), filter(nil), rowset=16
      sort_keys([bbsj.create_date_, DESC]), topn(1)
  1 - output([bbsj.create_date_], [bbsj.cell_value_]), filter(nil), rowset=16
      conds(nil), nl_params_([bb.id_(:0)]), use_batch=true
  2 - output([bb.id_]), filter([cast(bb.qyzt_, DECIMAL(4, 0)) = cast('1', DECIMAL(1, -1))]), rowset=16
      access([bb.id_], [bb.qyzt_]), partitions(p0)
      is_index_back=true, is_global_index=false, filter_before_indexback[false], 
      range_key([bb.zt_ztxx_id_], [bb.kj_qj_], [bb.bb_code_], [bb.id_]), range(h0000000000000772032784086532096,202406,xjllb,MIN ; h0000000000000772032784086532096,
      202406,xjllb,MAX), 
      range_cond([bb.zt_ztxx_id_ = 'h0000000000000772032784086532096'], [bb.kj_qj_ = '202406'], [bb.bb_code_ = 'xjllb'])
  3 - output([bbsj.cell_value_], [bbsj.create_date_]), filter([CASE WHEN 'xqykjzz' = 'qykjzz' THEN bbsj.bb_mb_cell_id_ = '39c08a600bbe2ba5e15b1a9d1c82b1ac' 
      WHEN 'xqykjzz' = 'qykjzzyzx' THEN bbsj.bb_mb_cell_id_ = 'abc08a600bbe2ba5e15b1a9d1c82b1ac' WHEN 'xqykjzz' = 'xqykjzz' THEN bbsj.bb_mb_cell_id_ = '98d67e9ba5a801dbbc50537bd7184ffb' 
      WHEN 'xqykjzz' = 'qykjzd' THEN bbsj.bb_mb_cell_id_ = '25e9f9430fa1bec83a31241a361ccac3' ELSE 0 END]), rowset=16
      access([GROUP_ID], [bbsj.id_], [bbsj.bb_mb_cell_id_], [bbsj.cell_value_], [bbsj.create_date_]), partitions(p0)
      is_index_back=true, is_global_index=false, filter_before_indexback[true], 
      range_key([bbsj.bb_bbxx_id_], [bbsj.bb_mb_cell_id_], [bbsj.shadow_pk_0]), range(MIN ; MAX), 
      range_cond([bbsj.bb_bbxx_id_ = :0])

【复现路径】问题出现前后相关操作
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):

【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

【备注】基于 LLM 和开源文档 RAG 的论坛小助手已开放测试,在发帖时输入 [@论坛小助手] 即可召唤小助手,欢迎试用!

2 个赞

推荐使用 obdiag 工具进行日志采集,以便我们更好地分析 SQL 性能下降的原因。

日志采集场景: SQL性能问题
日志采集命令: obdiag gather scene run --scene=observer.perf_sql --env “{db_connect=’-h127.0.0.1 -P2881 -utest@test -p****** -Dtest’, trace_id=‘Yxx’}”

请根据上述命令采集日志,并将采集到的日志压缩包提供给我们以便确认问题所在。

另外,为了进一步明确问题,请您提供以下信息:

  1. 升级前后是否对表结构或索引进行了任何修改?
  2. 是否启用了新的优化器特性或调整了相关参数(如 _ob_enable_optimizer_adaptive_feature)?
  3. 问题 SQL 的 QPS 或执行频率是否有显著变化?
  4. 是否可以提供具体的 SQL 语句以及升级前后的执行计划对比?
  5. 是否尝试过重新统计表的直方图或基本统计信息(如 ANALYZE TABLE)?

附上敏捷诊断工具 obdiag 使用帮助链接

1 个赞

OceanBase 社区已接收您的帖子,正在跟进中。

1 个赞

:+1: :+1: :+1:

1 个赞

确定是bug,动态采样针对条件下压的场景处理不对,目前绕过去的方法关闭动态采样set global optimizer_dynamic_sampling=0;