列存【count distinct】执行速度比ck慢

【 使用环境 】 测试环境
【 OB or 其他组件 】OBserver
【 使用版本 】4.3.5.5

环境描述:
1、8c3g 三节点集群(8c32g 300G存储)
2、租户8c14g内存
3、列存表1.2亿条,未使用分区,二十多个字段,主要是varchar和int
4、旁路导入数据后,收集统计信息
call dbms_stats.gather_table_stats(‘ceshi’, ‘t1’);
4、ck单节点(4c16g 500G存储)
5、查询sql大部分是全表查询,没有where条件。

sql耗时:
1、此类sql执行速度ob均比ck慢
select sum,count() from t1;
select name,sum,coount(
) from t1 group by name;

2、sql一旦涉及count disnct ,OB比ck慢几倍。
ob:5.7s
ck:2s
select COUNT(DISTINCT user_id) from t1;

ob:13s
ck:5.5s
SELECT local, name, COUNT(DISTINCT user_id), COUNT(*), SUM(amount)
FROM t1
GROUP BY 1,2
ORDER BY 1 DESC
LIMIT 1

问题:
辛苦老师们看下 count distinct 类 SQL 是否还有优化空间,希望执行速度能和CK差不多。

ob执行计划:

mysql> explain select COUNT(DISTINCT user_id)  from t1;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                      |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ============================================================================================================                                                    |
| |ID|OPERATOR             |NAME                                                      |EST.ROWS |EST.TIME(us)|                                                    |
| ------------------------------------------------------------------------------------------------------------                                                    |
| |0 |SCALAR GROUP BY      |                                                          |1        |3913186     |                                                    |
| |1 |└─SUBPLAN SCAN       |VIEW1                                                     |2226824  |3872827     |                                                    |
| |2 |  └─MERGE DISTINCT   |                                                          |2226824  |3866933     |                                                    |
| |3 |    └─TABLE FULL SCAN|t1(idx_userid)|121458154|2515684     |                                                    |
| ============================================================================================================                                                    |
| Outputs & filters:                                                                                                                                              |
| -------------------------------------                                                                                                                           |
|   0 - output([T_FUN_COUNT(VIEW1.t1.user_id)]), filter(nil), rowset=256                                              |
|       group(nil), agg_func([T_FUN_COUNT(VIEW1.t1.user_id)])                                                         |
|   1 - output([VIEW1.t1.user_id]), filter(nil), rowset=256                                                           |
|       access([VIEW1.t1.user_id])                                                                                    |
|   2 - output([t1.user_id]), filter(nil), rowset=256                                                                 |
|       distinct([t1.user_id])                                                                                        |
|   3 - output([t1.user_id]), filter(nil), rowset=256                                                                 |
|       access([t1.user_id]), partitions(p0)                                                                          |
|       is_index_back=false, is_global_index=false,                                                                                                               |
|       range_key([t1.user_id], [t1.id]), range(MIN,MIN ; MAX,MAX)always  |
|       true                                                                                                                                                      |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
21 rows in set (0.01 sec)

加上/*+ PARALLEL(16) */ 并发看一下 执行计划explain extended

单个字段 count distinct 快了。和ck差不多

select/*+ PARALLEL(16) */ COUNT(DISTINCT user_id)  from t1;
+-------------------------+
| COUNT(DISTINCT user_id) |
+-------------------------+
|                 2272948 |
+-------------------------+
1 row in set (1.84 sec)


mysql> explain select/*+ PARALLEL(16) */ COUNT(DISTINCT user_id)  from t1;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                      |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ===============================================================================================================================                                 |
| |ID|OPERATOR                                |NAME                                                      |EST.ROWS |EST.TIME(us)|                                 |
| -------------------------------------------------------------------------------------------------------------------------------                                 |
| |0 |SCALAR GROUP BY                         |                                                          |1        |2132405     |                                 |
| |1 |└─PX COORDINATOR                        |                                                          |16       |2132405     |                                 |
| |2 |  └─EXCHANGE OUT DISTR                  |:EX10001                                                  |16       |2132403     |                                 |
| |3 |    └─MERGE GROUP BY                    |                                                          |16       |2132402     |                                 |
| |4 |      └─SUBPLAN SCAN                    |VIEW1                                                     |2226824  |2129880     |                                 |
| |5 |        └─MERGE DISTINCT                |                                                          |2226824  |2129511     |                                 |
| |6 |          └─EXCHANGE IN MERGE SORT DISTR|                                                          |34574727 |2105471     |                                 |
| |7 |            └─EXCHANGE OUT DISTR (HASH) |:EX10000                                                  |34574727 |1305365     |                                 |
| |8 |              └─MERGE DISTINCT          |                                                          |34574727 |241684      |                                 |
| |9 |                └─PX BLOCK ITERATOR     |                                                          |121458154|157231      |                                 |
| |10|                  └─TABLE FULL SCAN     |t1(idx_userid)|121458154|157231      |                                 |
| ===============================================================================================================================                                 |
| Outputs & filters:                                                                                                                                              |
| -------------------------------------                                                                                                                           |
|   0 - output([T_FUN_COUNT_SUM(T_FUN_COUNT(VIEW1.t1.user_id))]), filter(nil), rowset=256                             |
|       group(nil), agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT(VIEW1.t1.user_id))])                                        |
|   1 - output([T_FUN_COUNT(VIEW1.t1.user_id)]), filter(nil), rowset=256                                              |
|   2 - output([T_FUN_COUNT(VIEW1.t1.user_id)]), filter(nil), rowset=256                                              |
|       dop=16                                                                                                                                                    |
|   3 - output([T_FUN_COUNT(VIEW1.t1.user_id)]), filter(nil), rowset=256                                              |
|       group(nil), agg_func([T_FUN_COUNT(VIEW1.t1.user_id)])                                                         |
|   4 - output([VIEW1.t1.user_id]), filter(nil), rowset=256                                                           |
|       access([VIEW1.t1.user_id])                                                                                    |
|   5 - output([t1.user_id]), filter(nil), rowset=256                                                                 |
|       distinct([t1.user_id])                                                                                        |
|   6 - output([t1.user_id]), filter(nil), rowset=256                                                                 |
|       sort_keys([t1.user_id, ASC])                                                                                  |
|   7 - output([t1.user_id]), filter(nil), rowset=256                                                                 |
|       (#keys=1, [t1.user_id]), dop=16                                                                               |
|   8 - output([t1.user_id]), filter(nil), rowset=256                                                                 |
|       distinct([t1.user_id])                                                                                        |
|   9 - output([t1.user_id]), filter(nil), rowset=256                                                                 |
|       asc                                                                                                                                                       |
|  10 - output([t1.user_id]), filter(nil), rowset=256                                                                 |
|       access([t1.user_id]), partitions(p0)                                                                          |
|       is_index_back=false, is_global_index=false,                                                                                                               |
|       range_key([t1.user_id], [t1.id]), range(MIN,MIN ; MAX,MAX)always  |
|       true                                                                                                                                                      |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
41 rows in set (0.01 sec)

但 select 多个字段时这种还没ck快,麻烦老师再帮忙看下。
ob:12.7s
ck:5.5s

mysql> explain SELECT/*+ PARALLEL(16) */ disbursed_on_local, gender, COUNT(DISTINCT user_id), COUNT(*), SUM(`total_amount_usd`)
    -> FROM `t1`
    -> GROUP BY 1,2
    -> ORDER BY 1 DESC
    -> LIMIT 1;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                                                                                 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| =============================================================================================================================                                                                                              |
| |ID|OPERATOR                                          |NAME                                          |EST.ROWS |EST.TIME(us)|                                                                                              |
| -----------------------------------------------------------------------------------------------------------------------------                                                                                              |
| |0 |LIMIT                                             |                                              |1        |36437539    |                                                                                              |
| |1 |└─PX COORDINATOR MERGE SORT                       |                                              |16       |36437539    |                                                                                              |
| |2 |  └─EXCHANGE OUT DISTR                            |:EX10002                                      |16       |36437515    |                                                                                              |
| |3 |    └─TOP-N SORT                                  |                                              |16       |36437512    |                                                                                              |
| |4 |      └─HASH GROUP BY                             |                                              |2034     |36437511    |                                                                                              |
| |5 |        └─EXCHANGE IN DISTR                       |                                              |32544    |36437176    |                                                                                              |
| |6 |          └─EXCHANGE OUT DISTR (HASH)             |:EX10001                                      |32544    |36434171    |                                                                                              |
| |7 |            └─HASH GROUP BY                       |                                              |32544    |36427411    |                                                                                              |
| |8 |              └─SUBPLAN SCAN                      |VIEW1                                         |119873195|35395629    |                                                                                              |
| |9 |                └─HASH GROUP BY                   |                                              |119873195|35375798    |                                                                                              |
| |10|                  └─EXCHANGE IN DISTR             |                                              |121406905|31665697    |                                                                                              |
| |11|                    └─EXCHANGE OUT DISTR (HASH)   |:EX10000                                      |121406905|23037313    |                                                                                              |
| |12|                      └─HASH GROUP BY             |                                              |121406905|3634818     |                                                                                              |
| |13|                        └─PX BLOCK ITERATOR       |                                              |121458154|267697      |                                                                                              |
| |14|                          └─COLUMN TABLE FULL SCAN|t1|121458154|267697      |                                                                                              |
| =============================================================================================================================                                                                                              |
| Outputs & filters:                                                                                                                                                                                                         |
| -------------------------------------                                                                                                                                                                                      |
|   0 - output([VIEW1.SEL_3], [VIEW1.t1.gender], [T_FUN_COUNT_SUM(T_FUN_COUNT(VIEW1.t1.user_id))],                                   |
|        [T_FUN_COUNT_SUM(T_FUN_COUNT_SUM(VIEW1.T_FUN_COUNT(*)))], [T_FUN_SUM(T_FUN_SUM(VIEW1.SEL_2))]), filter(nil), rowset=256                                                                                             |
|       limit(1), offset(nil)                                                                                                                                                                                                |
|   1 - output([VIEW1.SEL_3], [VIEW1.t1.gender], [T_FUN_COUNT_SUM(T_FUN_COUNT(VIEW1.t1.user_id))],                                   |
|        [T_FUN_COUNT_SUM(T_FUN_COUNT_SUM(VIEW1.T_FUN_COUNT(*)))], [T_FUN_SUM(T_FUN_SUM(VIEW1.SEL_2))]), filter(nil), rowset=256                                                                                             |
|       sort_keys([VIEW1.SEL_3, DESC])                                                                                                                                                                                       |
|   2 - output([VIEW1.SEL_3], [VIEW1.t1.gender], [T_FUN_COUNT_SUM(T_FUN_COUNT(VIEW1.t1.user_id))],                                   |
|        [T_FUN_COUNT_SUM(T_FUN_COUNT_SUM(VIEW1.T_FUN_COUNT(*)))], [T_FUN_SUM(T_FUN_SUM(VIEW1.SEL_2))]), filter(nil), rowset=256                                                                                             |
|       dop=16                                                                                                                                                                                                               |
|   3 - output([VIEW1.SEL_3], [VIEW1.t1.gender], [T_FUN_COUNT_SUM(T_FUN_COUNT(VIEW1.t1.user_id))],                                   |
|        [T_FUN_COUNT_SUM(T_FUN_COUNT_SUM(VIEW1.T_FUN_COUNT(*)))], [T_FUN_SUM(T_FUN_SUM(VIEW1.SEL_2))]), filter(nil), rowset=256                                                                                             |
|       sort_keys([VIEW1.SEL_3, DESC]), topn(1)                                                                                                                                                                              |
|   4 - output([VIEW1.SEL_3], [VIEW1.t1.gender], [T_FUN_COUNT_SUM(T_FUN_COUNT(VIEW1.t1.user_id))],                                   |
|        [T_FUN_COUNT_SUM(T_FUN_COUNT_SUM(VIEW1.T_FUN_COUNT(*)))], [T_FUN_SUM(T_FUN_SUM(VIEW1.SEL_2))]), filter(nil), rowset=256                                                                                             |
|       group([VIEW1.SEL_3], [VIEW1.t1.gender]), agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT_SUM(VIEW1.T_FUN_COUNT(*)))],                                                              |
|        [T_FUN_SUM(T_FUN_SUM(VIEW1.SEL_2))], [T_FUN_COUNT_SUM(T_FUN_COUNT(VIEW1.t1.user_id))])                                                                                  |
|   5 - output([VIEW1.SEL_3], [VIEW1.t1.gender], [T_FUN_COUNT_SUM(VIEW1.T_FUN_COUNT(*))], [T_FUN_SUM(VIEW1.SEL_2)],                                                              |
|        [T_FUN_COUNT(VIEW1.t1.user_id)]), filter(nil), rowset=256                                                                                                               |
|   6 - output([VIEW1.SEL_3], [VIEW1.t1.gender], [T_FUN_COUNT_SUM(VIEW1.T_FUN_COUNT(*))], [T_FUN_SUM(VIEW1.SEL_2)],                                                              |
|        [T_FUN_COUNT(VIEW1.t1.user_id)]), filter(nil), rowset=256                                                                                                               |
|       (#keys=2, [VIEW1.SEL_3], [VIEW1.t1.gender]), dop=16                                                                                                                      |
|   7 - output([VIEW1.SEL_3], [VIEW1.t1.gender], [T_FUN_COUNT_SUM(VIEW1.T_FUN_COUNT(*))], [T_FUN_SUM(VIEW1.SEL_2)],                                                              |
|        [T_FUN_COUNT(VIEW1.t1.user_id)]), filter(nil), rowset=256                                                                                                               |
|       group([VIEW1.SEL_3], [VIEW1.t1.gender]), agg_func([T_FUN_COUNT_SUM(VIEW1.T_FUN_COUNT(*))], [T_FUN_SUM(VIEW1.SEL_2)],                                                     |
|        [T_FUN_COUNT(VIEW1.t1.user_id)])                                                                                                                                        |
|   8 - output([VIEW1.T_FUN_COUNT(*)], [VIEW1.SEL_2], [VIEW1.SEL_3], [VIEW1.t1.gender], [VIEW1.t1.user_id]), filter(nil), rowset=256 |
|       access([VIEW1.T_FUN_COUNT(*)], [VIEW1.SEL_2], [VIEW1.SEL_3], [VIEW1.t1.gender], [VIEW1.t1.user_id])                          |
|   9 - output([T_FUN_COUNT_SUM(T_FUN_COUNT(*))], [T_FUN_SUM(T_FUN_SUM(t1.total_amount_usd))], [t1.disbursed_on_loc                  |
|       al], [t1.gender], [t1.user_id]), filter(nil), rowset=256                                                                     |
|       group([t1.disbursed_on_local], [t1.gender], [t1.user_id]),                       |
|        agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT(*))], [T_FUN_SUM(T_FUN_SUM(t1.total_amount_usd))])                                                                                |
|  10 - output([t1.disbursed_on_local], [t1.gender], [t1.user_id],                       |
|        [T_FUN_COUNT(*)], [T_FUN_SUM(t1.total_amount_usd)]), filter(nil), rowset=256                                                                                            |
|  11 - output([t1.disbursed_on_local], [t1.gender], [t1.user_id],                       |
|        [T_FUN_COUNT(*)], [T_FUN_SUM(t1.total_amount_usd)]), filter(nil), rowset=256                                                                                            |
|       (#keys=3, [t1.disbursed_on_local], [t1.gender], [t1.user_id]),                   |
|        dop=16                                                                                                                                                                                                              |
|  12 - output([t1.disbursed_on_local], [t1.gender], [t1.user_id],                       |
|        [T_FUN_COUNT(*)], [T_FUN_SUM(t1.total_amount_usd)]), filter(nil), rowset=256                                                                                            |
|       group([t1.disbursed_on_local], [t1.gender], [t1.user_id]),                       |
|        agg_func([T_FUN_COUNT(*)], [T_FUN_SUM(t1.total_amount_usd)])                                                                                                            |
|  13 - output([t1.disbursed_on_local], [t1.gender], [t1.user_id],                       |
|        [t1.total_amount_usd]), filter(nil), rowset=256                                                                                                                         |
|  14 - output([t1.disbursed_on_local], [t1.gender], [t1.user_id],                       |
|        [t1.total_amount_usd]), filter(nil), rowset=256                                                                                                                         |
|       access([t1.disbursed_on_local], [t1.gender], [t1.user_id],                       |
|        [t1.total_amount_usd]), partitions(p0)                                                                                                                                  |
|       is_index_back=false, is_global_index=false,                                                                                                                                                                          |
|       range_key([t1.id]), range(MIN ; MAX)always true                                                                                                                          |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
70 rows in set (0.01 sec)
1 个赞

@淇铭 辛苦老师有时间帮忙看下哈