SUM过滤出10W行数据,查询需要分钟级延迟,优化路径

【产品名称】

【产品版本】3.1.3

【问题描述】

mysql> explain select * from xxx\G

*************************** 1. row ***************************

Query Plan: =================================================================================

|ID|OPERATOR |NAME |EST. ROWS |COST |

---------------------------------------------------------------------------------

|0 |PX COORDINATOR | |8624112924|497374044900|

|1 | EXCHANGE OUT DISTR |:EX10000 |8624112924|385736937339|

|2 | PX PARTITION ITERATOR| |8624112924|385736937339|

|3 | TABLE SCAN |xxx|8624112924|385736937339|

=================================================================================

explain SELECT /*+ query_timeout(100000000000) parallel(16) */ IFNULL(SUM(xx) , 0) AS aa FROM xxx WHERE a1 = 251287325 AND a2 = 30 AND bb IN ( 0 ) AND cc IN ( 40 , 50 , 60 , 70 ) AND dd NOT IN ( 1321 , 1270 , 1013 , 1899 , 1214 , 3551 , 1206 , 3913 ) AND ee < 1653029404748\G

*************************** 1. row ***************************

Query Plan: ========================================================================================================================

|ID|OPERATOR |NAME |EST. ROWS|COST |

------------------------------------------------------------------------------------------------------------------------

|0 |SCALAR GROUP BY | |1 |51718324752|

|1 | PX COORDINATOR | |1 |51717933493|

|2 | EXCHANGE OUT DISTR |:EX10000 |1 |51717933492|

|3 | MERGE GROUP BY | |1 |51717933492|

|4 | PX BLOCK ITERATOR| |2048346 |51717542233|

|5 | TABLE SCAN |xx(xx_idx)|2048346 |51717542233|

========================================================================================================================

索引xx_idx已经覆盖了需要查询的所有数据,这里explain出来是2048346行,实际执行需要6min

实际上count行数是98693,花了15min

SELECT /*+ query_timeout(100000000000) parallel(16) */ COUNT(xx) FROM xx WHERE aa = 251287325 AND bb = 30 AND cc IN ( 0 ) AND dd IN ( 40 , 50 , 60 , 70 ) AND ee NOT IN ( 1321 , 1270 , 1013 , 1899 , 1214 , 3551 , 1206 , 3913 ) AND ff < 1653029404748\G

*************************** 1. row ***************************

COUNT(xx): 98693

1 row in set (15 min 57.49 sec)

表上面没有索引,还是索引没有办法命中

想问的问题是count为什么更慢吗

看执行计划,确实是用到了 xx_idx 索引,可以提供下更具体的索引结构吗

有索引且命中了

count的执行计划跟那个是一样的。确实是用到了xxx_index索引

创建索引的语句


 alter table xx 

 KEY xx_idx(a1, a2, bb, cc, dd, ee, aa) local;

整个表的数据量有多大,是有16个分区吗

not in 是使用不到索引的,如果加上 ff < 1653029404748条件数据量会少多少

整个表的数据量有80亿左右,使用range分区的,按月分区

去掉NOT IN以后,是9W行左右。

 explain SELECT /*+ query_timeout(100000000000) parallel(16) */ COUNT(xx) FROM  xx WHERE  aa = 251287325     AND bb = 30     AND cc IN ( 0 ) ;  看下多少行,和计划是如何;

另外,这种场景,感觉是建一个aa,bb,cc的索引才能覆盖的;

a1, a2, bb, cc, dd, ee, aa 这个索引,你没有指定a1和a2,只指定bb,cc和aa是没办法高校的扫描的

| ========================================================================================================================

|ID|OPERATOR       |NAME                                  |EST. ROWS|COST    |

------------------------------------------------------------------------------------------------------------------------

|0 |SCALAR GROUP BY   |                                    |1    |52309906447|

|1 | PX COORDINATOR   |                                    |1    |52309442146|

|2 | EXCHANGE OUT DISTR |:EX10000                                |1    |52309442146|

|3 |  MERGE GROUP BY  |                                    |1    |52309442146|

|4 |  PX BLOCK ITERATOR|                                    |2430738 |52308977845|

|5 |   TABLE SCAN   |xx(xx_idx)|2430738 |52308977845|

========================================================================================================================

explain是2430738,实际执行这个是9W行

命中索引后还是有这么多数据

这个查询:SELECT /*+ query_timeout(100000000000) parallel(16) */ COUNT(xx) FROM  xx WHERE  aa = 251287325     AND bb = 30     AND cc IN ( 0 ) ;  耗时多久?

要不要建个索引 aa,bb,cc的试试看?

这个执行count结果是20W,命中上面说的那个索引了,需要16min时间

直接执行这个count是需要16min吗?

那你按照我的说法,再建一个索引吧

建了新索引以后是5min了

慢的原因是数据都在其中两个分区内,range分区是不是不太适合这种场景,多台机器的cpu资源没有用上

如果完全命中索引,这个时间不太符合预期;可以把这个sql执行的sql audit也贴一下

总共有多少个分区?

20个分区,只有那2个分区有数据,其它是空的

如果索引是aa,bb,cc 查询条件也是aa,bb,cc那不应该呀,即使是20个分区只有两个分区有数据,也不应该这么慢