最近遇到一个SQL语句把数据库弄奔溃了,分享一下数据库的逻辑运算的优先级。
1. 存在问题的SQL
下面的SQL中每个表的总数据量是2797行,且数据一样,并且两个表上都没有索引(因为懒)。
explain select count(*) from t1,t2 where t1.object_name = t2.object_name or t1.DATA_OBJECT_ID = t2.DATA_OBJECT_ID and t1.OBJECT_ID = 123 and rownum = 1\G
*************************** 1. row ***************************
Query Plan: ============================================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
------------------------------------------------------------
|0 |SCALAR GROUP BY | |1 |3085179|
|1 | COUNT | |76706 |3082251|
|2 | NESTED-LOOP JOIN CARTESIAN| |7820412 |2865184|
|3 | TABLE SCAN |T1 |2796 |1082 |
|4 | MATERIAL | |2797 |3218 |
|5 | PX COORDINATOR | |2797 |2956 |
|6 | EXCHANGE OUT DISTR |:EX10000|2797 |1082 |
|7 | TABLE SCAN |T2 |2797 |1082 |
============================================================
Outputs & filters:
-------------------------------------
0 - output([T_FUN_COUNT(*)]), filter(nil),
group(nil), agg_func([T_FUN_COUNT(*)])
1 - output([1]), filter([T1.OBJECT_NAME = T2.OBJECT_NAME OR (T_OP_AND, T1.DATA_OBJECT_ID = T2.DATA_OBJECT_ID, T1.OBJECT_ID = 123, rownum() = 1)])
2 - output([T1.OBJECT_NAME], [T2.OBJECT_NAME], [T1.DATA_OBJECT_ID], [T2.DATA_OBJECT_ID], [T1.OBJECT_ID]), filter(nil),
conds(nil), nl_params_(nil)
3 - output([T1.OBJECT_NAME], [T1.DATA_OBJECT_ID], [T1.OBJECT_ID]), filter(nil),
access([T1.OBJECT_NAME], [T1.DATA_OBJECT_ID], [T1.OBJECT_ID]), partitions(p0)
4 - output([T2.OBJECT_NAME], [T2.DATA_OBJECT_ID]), filter(nil)
5 - output([T2.OBJECT_NAME], [T2.DATA_OBJECT_ID]), filter(nil)
6 - output([T2.OBJECT_NAME], [T2.DATA_OBJECT_ID]), filter(nil), is_single, dop=1
7 - output([T2.OBJECT_NAME], [T2.DATA_OBJECT_ID]), filter(nil),
access([T2.OBJECT_NAME], [T2.DATA_OBJECT_ID]), partitions(p0)
看似正常的SQL语句,执行计划中ID等于2的步骤,没有过滤筛选,使用了嵌套循环连接,结果集的总数据量有7820412行。ID等于1的步骤中有过滤条件,过滤条件是 filter([T1.OBJECT_NAME = T2.OBJECT_NAME OR (T_OP_AND, T1.DATA_OBJECT_ID = T2.DATA_OBJECT_ID, T1.OBJECT_ID = 123, rownum() = 1)]) ,业务逻辑是先取两个表中object_name相同或DATA_OBJECT_ID 相同的数据,得到结果集后再根据T1.OBJECT_ID得到相关数据,最后保留一条。SQL的实际执行结果并没有按照预期进行。
结合执行计划中的过滤条件和官方的逻辑运算符的优先级说明看到,AND的优先级是比OR优先级要高,SQL等价于select count(*) from t1,t2 where t1.object_name = t2.object_name or (t1.DATA_OBJECT_ID = t2.DATA_OBJECT_ID and t1.OBJECT_ID = 123 and rownum = 1),因此导致了SQL的执行不符合预期且造成大量的资源消耗。
2. 问题解决
知道了原因,解决起来就很简单,把OR的条件用括号包含起来,让OR条件先计算,这让就符合业务的预期。
explain select count(*) from t1 t1,t2 t2 where (t1.object_name = t2.object_name or t1.DATA_OBJECT_ID = t2.DATA_OBJECT_ID) and t1.OBJECT_ID = 123 and rownum = 1\G
*************************** 1. row ***************************
Query Plan: ====================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------
|0 |SCALAR GROUP BY | |1 |1199|
|1 | SUBPLAN SCAN |VIEW1 |1 |1199|
|2 | LIMIT | |1 |1199|
|3 | NESTED-LOOP JOIN | |1 |1199|
|4 | PX COORDINATOR | |2 |47 |
|5 | EXCHANGE OUT DISTR|:EX10000|2 |46 |
|6 | TABLE SCAN |T2 |2 |46 |
|7 | MATERIAL | |28 |1144|
|8 | TABLE SCAN |T1 |28 |1141|
====================================================
Outputs & filters:
-------------------------------------
0 - output([T_FUN_COUNT(*)]), filter(nil),
group(nil), agg_func([T_FUN_COUNT(*)])
1 - output([1]), filter(nil),
access(nil)
2 - output([1]), filter(nil), limit(?), offset(nil)
3 - output([1]), filter(nil),
conds([T1.OBJECT_NAME = T2.OBJECT_NAME OR T1.DATA_OBJECT_ID = T2.DATA_OBJECT_ID]), nl_params_(nil)
4 - output([T2.OBJECT_NAME], [T2.DATA_OBJECT_ID]), filter(nil)
5 - output([T2.OBJECT_NAME], [T2.DATA_OBJECT_ID]), filter(nil), is_single, dop=1
6 - output([T2.OBJECT_NAME], [T2.DATA_OBJECT_ID]), filter(nil),
access([T2.OBJECT_NAME], [T2.DATA_OBJECT_ID]), partitions(p0)
7 - output([T1.OBJECT_NAME], [T1.DATA_OBJECT_ID]), filter(nil)
8 - output([T1.OBJECT_NAME], [T1.DATA_OBJECT_ID]), filter([T1.OBJECT_ID = 123]),
access([T1.OBJECT_NAME], [T1.DATA_OBJECT_ID], [T1.OBJECT_ID]), partitions(p0)
执行计划中ID等于8的T1表的过滤条件是OBJECT_ID = 123,两个表连接时,使用了连接条件T1.OBJECT_NAME = T2.OBJECT_NAME OR T1.DATA_OBJECT_ID = T2.DATA_OBJECT_ID,这样的SQL数据量非常小,执行特别快。
3. 总结
本次问题原因是因为一个逻辑运算的优先级判断错误。对于自己拿不准的SQL语句,查阅官方文档是最权威的参考依据。其次,在测试环境就算只有少量数据也可以看到执行计划是否合理。最后,上线时SQL语句经过把关,可以最大限度减少SQL语句引发的线上故障。