一个关于函数索引的bug

【 使用环境 】测试环境
【 OB or 其他组件 】observer
【 使用版本 】4.2.1
【问题描述】修改了一个查询条件 and 改成 or 就导致查询时间差距非常大
【复现路径】建表(mysql兼容租户),插入大概数百万行数据,使用select语句查询
【问题现象及影响】
表:

CREATE TABLE `cross_record` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `cross_id` varchar(64) NOT NULL DEFAULT '',
  `src_tx_id` varchar(64) NOT NULL DEFAULT '',
  `des_tx_id` varchar(64) NOT NULL DEFAULT '',
  `tx_sender` varchar(64) NOT NULL DEFAULT '',
  `tx_receiver` varchar(64) NOT NULL DEFAULT '',
  `send_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `status` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `idx_cross_id` (`cross_id`) BLOCK_SIZE 16384 LOCAL,
  KEY `idx_send_time` (`send_time`) BLOCK_SIZE 16384 LOCAL,
  KEY `idx_status` (`status`) BLOCK_SIZE 16384 LOCAL,
  KEY `all_mix_idx_g` (`send_time`, `id`, `cross_id`, `tx_sender`, `tx_receiver`, `status`) BLOCK_SIZE 16384 GLOBAL,
  KEY `tx_sender_func` (right(`tx_sender`,13)) BLOCK_SIZE 16384 LOCAL,
  KEY `tx_rev_func` (right(`tx_receiver`,13)) BLOCK_SIZE 16384 LOCAL
) DEFAULT CHARSET = utf8mb4
 partition by hash(id)
(partition `p0`,
partition `p1`,
partition `p2`,
partition `p3`,
partition `p4`,
partition `p5`,
partition `p6`,
partition `p7`)

此处可以看到,有2个local,函数索引, tx_rev_func, tx_sender_func, 分别对应2个函数,right(tx_sender,13), right(tx_receiver,13)
然后随机插入数百w的测试数据, tx_sender, tx_receiver有值

使用sql语句查询:

SELECT count(*) FROM `cross_record` where RIGHT(tx_sender, 13) = 'xxx' or RIGHT(tx_receiver, 13) = 'xxx';

就非常的慢大概1s多,

把or换成and

SELECT count(*) FROM `cross_record` where RIGHT(tx_sender, 13) = 'xxx' and RIGHT(tx_receiver, 13) = 'xxx';

快了非常的多,也就几毫秒

通过explain也可以看出来,or走的是全表,and走的是索引

or查询有多少行,and有多少行?

执行计划也发一下吧

explain SELECT count(*) FROM `cross_record` where RIGHT(tx_sender, 13) = 'xxx' or RIGHT(tx_receiver, 13) = 'xxx';

=====================================================================
|ID|OPERATOR                     |NAME        |EST.ROWS|EST.TIME(us)|
---------------------------------------------------------------------
|0 |SCALAR GROUP BY              |            |1       |178054      |
|1 |└─PX COORDINATOR             |            |1       |178054      |
|2 |  └─EXCHANGE OUT DISTR       |:EX10000    |1       |178054      |
|3 |    └─MERGE GROUP BY         |            |1       |178053      |
|4 |      └─PX PARTITION ITERATOR|            |28257   |177541      |
|5 |        └─TABLE FULL SCAN    |cross_record|28257   |177541      |
=====================================================================
Outputs & filters:
-------------------------------------
  0 - output([T_FUN_COUNT_SUM(T_FUN_COUNT(*))]), filter(nil), rowset=256
      group(nil), agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT(*))])
  1 - output([T_FUN_COUNT(*)]), filter(nil), rowset=256
  2 - output([T_FUN_COUNT(*)]), filter(nil), rowset=256
      dop=1
  3 - output([T_FUN_COUNT(*)]), filter(nil), rowset=256
      group(nil), agg_func([T_FUN_COUNT(*)])
  4 - output(nil), filter(nil), rowset=256
      force partition granule
  5 - output(nil), filter([right(cross_record.tx_sender, 13) = 'xxx' OR right(cross_record.tx_receiver, 13) = 'xxx']), rowset=256
      access([cross_record.tx_sender], [cross_record.tx_receiver]), partitions(p[0-7])
      is_index_back=false, is_global_index=false, filter_before_indexback[false], 
      range_key([cross_record.id]), range(MIN ; MAX)always true
explain SELECT count(*) FROM `cross_record` where RIGHT(tx_sender, 13) = 'xxx' and RIGHT(tx_receiver, 13) = 'xxx';

|ID|OPERATOR                     |NAME                        |EST.ROWS|EST.TIME(us)|
-------------------------------------------------------------------------------------
|0 |SCALAR GROUP BY              |                            |1       |130254      |
|1 |└─PX COORDINATOR             |                            |1       |130254      |
|2 |  └─EXCHANGE OUT DISTR       |:EX10000                    |1       |130254      |
|3 |    └─MERGE GROUP BY         |                            |1       |130254      |
|4 |      └─PX PARTITION ITERATOR|                            |141     |130251      |
|5 |        └─TABLE RANGE SCAN   |cross_record(tx_sender_func)|141     |130251      |
=====================================================================================
Outputs & filters:
-------------------------------------
  0 - output([T_FUN_COUNT_SUM(T_FUN_COUNT(*))]), filter(nil), rowset=256
      group(nil), agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT(*))])
  1 - output([T_FUN_COUNT(*)]), filter(nil), rowset=256
  2 - output([T_FUN_COUNT(*)]), filter(nil), rowset=256
      dop=1
  3 - output([T_FUN_COUNT(*)]), filter(nil), rowset=256
      group(nil), agg_func([T_FUN_COUNT(*)])
  4 - output(nil), filter(nil), rowset=256
      force partition granule
  5 - output(nil), filter([right(cross_record.tx_receiver, 13) = 'xxx']), rowset=256
      access([cross_record.id], [cross_record.tx_receiver]), partitions(p[0-7])
      is_index_back=true, is_global_index=false, filter_before_indexback[false], 
      range_key([cross_record.SYS_NC26$], [cross_record.id]), range(xxx,MIN ; xxx,MAX), 
      range_cond([cross_record.SYS_NC26$ = 'xxx'])

Mark

目前对于or条件没有推导函数索引,绕过方式

  1. 手动改成union all
  2. 使用隐藏名来指定,比如你的查询条件从right(xx)=xxx改成用对应的函数索引名cross_record.SYS_NC26$=xxx
1 个赞