json字段的查询速度特别慢

【 使用环境 】生产环境
【 OB or 其他组件 】
【 使用版本 】4.2.2
【问题描述】

select
  *
from
  mac_ocr_history
where
  res_raw_data ->> '$.errorCode' = 0
  and create_time >= '2023-10-01 00:00:00'
limit
  200, 10

where中带有json字段的条件查询速度就会特别慢,这条sql执行时间6秒,去掉res_raw_data ->> '$.errorCode' = 0查询时间0.003秒

【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):

nhii_ocean:nhii:mp_user,OB_MYSQL,mac_ocr_history,mac_ocr_history,OB_MYSQL_COM_QUERY,SELECT,success,,select%0A  *%0Afrom%0A  mac_ocr_history%0Awhere%0A  res_raw_data ->> '$.errorCode' = 0%0A  and create_time >= '2023-10-01 00:00:00'%0Alimit%0A  200%2C 10,6708605us,21us,0us,6296647us,Y0-00007FBC2B39FC20,YB42C0A80370-0006100B867EDA23-0-0,,,0,192.168.3.112:2881
2024-02-04 10:21:29.601105,undefined,,,,nhii_ocean:nhii:information_schema,OB_MYSQL,PROFILING,profiling,OB_MYSQL_COM_QUERY,SELECT,failed,1109,SELECT QUERY_ID%2C SUM(DURATION) AS SUM_DURATION FROM INFORMATION_SCHEMA.PROFILING GROUP BY QUERY_ID,1637us,1412us,0us,192us,Y0-00007FBC2B39FC20,YB42C0A80370-0006100B867EDA29-0-0,,,0,192.168.3.112:2881

【SOP系列 22 】——故障诊断第一步(系统巡检和诊断信息收集)

1 个赞

表结构和执行计划麻烦提供下。explain EXTENDED sql语句、

1 个赞

==========================================================
|ID|OPERATOR       |NAME           |EST.ROWS|EST.TIME(us)|
----------------------------------------------------------
|0 |TABLE FULL SCAN|mac_ocr_history|10      |1058        |
==========================================================
Outputs & filters:
-------------------------------------
  0 - output([mac_ocr_history.id(0x7f8042e364b0)], [mac_ocr_history.judge_id(0x7f8042e367d0)], [mac_ocr_history.school_id(0x7f8042e36af0)], [mac_ocr_history.user_id(0x7f8042e36e10)],
       [mac_ocr_history.images(0x7f8042e37130)], [mac_ocr_history.req_raw_data(0x7f8042e37450)], [mac_ocr_history.res_raw_data(0x7f8042e10370)], [mac_ocr_history.create_time(0x7f8042e13050)],
       [mac_ocr_history.type(0x7f8042e37770)]), filter([mac_ocr_history.create_time(0x7f8042e13050) >= INTERNAL_FUNCTION('2023-10-01 00:00:00', 114, 17)(0x7f8042e38a10)(0x7f8042e128c0)],
       [cast(JSON_UNQUOTE(JSON_EXTRACT(mac_ocr_history.res_raw_data(0x7f8042e10370), '$.errorCode')(0x7f8042e0e8c0))(0x7f8042e0e140), DECIMAL(0, 0))(0x7f8042e10aa0) 
      = cast(0, DECIMAL(1, 0))(0x7f8042e116d0)(0x7f8042e0fb80)])
      access([mac_ocr_history.id(0x7f8042e364b0)], [mac_ocr_history.res_raw_data(0x7f8042e10370)], [mac_ocr_history.create_time(0x7f8042e13050)], [mac_ocr_history.judge_id(0x7f8042e367d0)],
       [mac_ocr_history.school_id(0x7f8042e36af0)], [mac_ocr_history.user_id(0x7f8042e36e10)], [mac_ocr_history.images(0x7f8042e37130)], [mac_ocr_history.req_raw_data(0x7f8042e37450)],
       [mac_ocr_history.type(0x7f8042e37770)]), partitions(p0)
      limit(10), offset(200), is_index_back=false, is_global_index=false, filter_before_indexback[false,false], 
      range_key([mac_ocr_history.id(0x7f8042e364b0)]), range(MIN ; MAX)always true
Used Hint:
-------------------------------------
  /*+
      
  */
Qb name trace:
-------------------------------------
  stmt_id:0, stmt_type:T_EXPLAIN 
  stmt_id:1, SEL$1
Outline Data: 
-------------------------------------
  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$1" "mp_user"."mac_ocr_history"@"SEL$1")
      OPTIMIZER_FEATURES_ENABLE('4.2.2.0')
      END_OUTLINE_DATA
  */
Optimization Info:
-------------------------------------
  mac_ocr_history:
      table_rows:14250
      physical_range_rows:14250
      logical_range_rows:14250
      index_back_rows:0
      output_rows:20
      table_dop:1
      dop_method:Table DOP
      avaiable_index_name:[create_time, mac_ocr_history]
      stats version:1706536872626923
      dynamic sampling level:0
      estimation method:[OPTIMIZER STATISTICS, STORAGE]
  Plan Type:
      LOCAL
  Note:
      Degree of Parallelisim is 1 because of table property
1 个赞

可以试试mysql的虚拟列功能,创建json类型的索引,测试是能走到索引的。
CREATE TABLE json_table (
id INT NOT NULL AUTO_INCREMENT,
data JSON,
indexed_field INT AS (JSON_UNQUOTE(JSON_EXTRACT(data, ‘$.key’))) STORED,
PRIMARY KEY (id),
INDEX idx_indexed_field (indexed_field)
);

不带res_raw_data的执行计划呢也发下

不带res_raw_data的

========================================================================
|ID|OPERATOR        |NAME                        |EST.ROWS|EST.TIME(us)|
------------------------------------------------------------------------
|0 |TABLE RANGE SCAN|mac_ocr_history(create_time)|10      |906         |
========================================================================
Outputs & filters:
-------------------------------------
  0 - output([mac_ocr_history.id(0x7f80cf00f3d0)], [mac_ocr_history.judge_id(0x7f80cf00f6f0)], [mac_ocr_history.school_id(0x7f80cf00fa10)], [mac_ocr_history.user_id(0x7f80cf00fd30)],
       [mac_ocr_history.images(0x7f80cf010050)], [mac_ocr_history.req_raw_data(0x7f80cf010370)], [mac_ocr_history.res_raw_data(0x7f80cf010690)], [mac_ocr_history.create_time(0x7f80cf00e250)],
       [mac_ocr_history.type(0x7f80cf0109b0)]), filter(nil)
      access([mac_ocr_history.id(0x7f80cf00f3d0)], [mac_ocr_history.create_time(0x7f80cf00e250)], [mac_ocr_history.judge_id(0x7f80cf00f6f0)], [mac_ocr_history.school_id(0x7f80cf00fa10)],
       [mac_ocr_history.user_id(0x7f80cf00fd30)], [mac_ocr_history.images(0x7f80cf010050)], [mac_ocr_history.req_raw_data(0x7f80cf010370)], [mac_ocr_history.res_raw_data(0x7f80cf010690)],
       [mac_ocr_history.type(0x7f80cf0109b0)]), partitions(p0)
      limit(10), offset(200), is_index_back=true, is_global_index=false, 
      range_key([mac_ocr_history.create_time(0x7f80cf00e250)], [mac_ocr_history.id(0x7f80cf00f3d0)]), range(2023-10-01 00:00:00.000000,MIN ; MAX,MAX), 
      range_cond([mac_ocr_history.create_time(0x7f80cf00e250) >= INTERNAL_FUNCTION('2023-10-01 00:00:00', 114, 17)(0x7f80cf0ca6f0)(0x7f80cf0c9e20)])
Used Hint:
-------------------------------------
  /*+
      
  */
Qb name trace:
-------------------------------------
  stmt_id:0, stmt_type:T_EXPLAIN 
  stmt_id:1, SEL$1
Outline Data: 
-------------------------------------
  /*+
      BEGIN_OUTLINE_DATA
      INDEX(@"SEL$1" "mp_user"."mac_ocr_history"@"SEL$1" "create_time")
      OPTIMIZER_FEATURES_ENABLE('4.2.2.0')
      END_OUTLINE_DATA
  */
Optimization Info:
-------------------------------------
  mac_ocr_history:
      table_rows:14250
      physical_range_rows:3099
      logical_range_rows:3099
      index_back_rows:3099
      output_rows:3099
      table_dop:1
      dop_method:Table DOP
      avaiable_index_name:[create_time, mac_ocr_history]
      stats version:1706536872626923
      dynamic sampling level:0
      estimation method:[OPTIMIZER STATISTICS, STORAGE]
  Plan Type:
      LOCAL
  Note:
      Degree of Parallelisim is 1 because of table property

为啥mysql查起来就很快呢?我们是从mysql转到ob的,带json字段的表和json里的字段比较多,建虚拟列修改成本太高了。还有没有更好的方法呢?

mysql的执行计划麻烦提供下。

1	SIMPLE	mac_ocr_history		ALL					11121	33.33	Using where

这是mysql的执行计划

explain  select
  *
from
  mac_ocr_history
where
  res_raw_data ->> '$.errorCode' = 0
  and create_time >= '2023-10-01 00:00:00'
limit
  200, 10

使用hint 增加并发看下耗时

select /*+ PARALLEL(16) */ * from xxx;

ob下执行下面sql,耗时6.561秒,与不并发时间一样。mysql下执行耗时0.528秒

select  /*+PARALLEL(16)*/
  *
from
  mac_ocr_history
where
  res_raw_data ->> '$.errorCode' = 0
  and create_time >= '2023-10-01 00:00:00'
limit
  200, 10

感觉执行计划有问题,没有res_raw_data 这个条件走索引然后返表速度快很多,带上后走full scan,统计信息收集一下试试呢

如何收集统计信息呢?另外来说即使走全表扫描那速度也太慢了,mysql也是全表扫描

4.x版本会自动收集统计信息,json字段慢问题还在和相关同学确认中。

select * from mac_ocr_history where JSON_EXTRACT(res_raw_data,’$.errorCode’)=0 and create_time >= ‘2023-10-01 00:00:00’; 这个试一下,测试能提升一些性能,是否能满足呢。

该问题已确认是lob优化问题,预检OB4.2.3版本会完成优化,届时可升级该版本验证下。