关于OBCP V4 实验手册统计信息部分咨询

【 使用环境 】测试环境
【 OB or 其他组件 】OB
【 使用版本 】4.3.5
【问题描述】根据实验操作,在插入数据后,查看explain计划 显示的join算法是NESTED-LOOP JOIN 其中一张表full scan 一张表 table get 然后收集统计信息后 再次查看explain计划 变成了merge join 代价也高了许多 多了exchange in out 算子 并且两表都是full scan 。代价反而变得更高了。也没能复现实验手册的场景。还请老师帮忙给看看。
实验手册:2.2统计信息收集https://mdn.alipayobjects.com/huamei_22khvb/afts/file/A*1eMHQoG4a8oAAAAAAAAAAAAAeiGDAQ/OBCP_V4_OceanBase%E6%95%B0%E6%8D%AE%E5%BA%93%E5%BC%80%E5%8F%91%E8%AE%BE%E8%AE%A1%E4%B8%8E%E4%BC%98%E5%8C%96%E5%AE%9E%E9%AA%8C%E6%8C%87%E5%AF%BC%E6%89%8B%E5%86%8C_20241120.pdf

实际操作步骤:
obclient(root@t1)[test]> insert/+ parallel(8),query_timeout(6000000000000)/ INTO lab22_t1 (c1, c2, c3) SELECT
→ a.N + b.N * 10 + c.N100 + d.N1000 + e.N10000 + f.N100000 + 1 AS c1, MOD(a.N +
→ b.N * 10 + c.N100 + d.N1000 + e.N10000 + f.N100000 + 1, 3) AS c2, MOD(a.N +
→ b.N * 10 + c.N100 + d.N1000 + e.N10000 + f.N100000 + 1, 500) AS c3 FROM
→ numbers a, numbers b, numbers c, numbers d, numbers e, numbers f WHERE a.N + b.N *
→ 10 + c.N100 + d.N1000 + e.N10000 + f.N100000 + 1 <=1000000;
Query OK, 1000000 rows affected (1 min 4.873 sec)
Records: 1000000 Duplicates: 0 Warnings: 0

obclient(root@t1)[test]> insert/+ parallel(8),query_timeout(6000000000000)/ INTO lab22_t2 (c1, c2, c3) SELECT
→ a.N + b.N * 10 + c.N100 + d.N1000 + e.N10000 + f.N100000 + 1 AS c1, MOD(a.N +
→ b.N * 10 + c.N100 + d.N1000 + e.N10000 + f.N100000 + 1, 3) AS c2, MOD(a.N +
→ b.N * 10 + c.N100 + d.N1000 + e.N10000 + f.N100000 + 1, 500) AS c3 FROM
→ numbers a, numbers b, numbers c, numbers d, numbers e, numbers f WHERE a.N + b.N *
→ 10 + c.N100 + d.N1000 + e.N10000 + f.N100000 + 1 <=1000000;
Query OK, 1000000 rows affected (1 min 5.091 sec)
Records: 1000000 Duplicates: 0 Warnings: 0

obclient(root@t1)[test]> INSERT INTO lab22_t1 (c1,c2,c3) VAlUES(9999999,9999999,9999999);
Query OK, 1 row affected (0.523 sec)

obclient(root@t1)[test]> INSERT INTO lab22_t2 (c1,c2,c3) VAlUES(9999999,9999999,9999999);
Query OK, 1 row affected (0.050 sec)

obclient(root@t1)[test]> EXPLAIN SELECT * FROM lab22_t1 t1, lab22_t2 t2 WHERE t1.c1=t2.c1 and t1.c2>500;
±--------------------------------------------------------------------------------------------+
| Query Plan |
±--------------------------------------------------------------------------------------------+
| ================================================= |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------- |
| |0 |NESTED-LOOP JOIN | |25 |30646 | |
| |1 |├─TABLE FULL SCAN|t1 |25 |30243 | |
| |2 |└─TABLE GET |t2 |1 |16 | |
| ================================================= |

Outputs & filters:
0 - output([t1.c1], [t1.c2], [t1.c3], [t2.c1], [t2.c2], [t2.c3]), filter(nil), rowset=256
conds(nil), nl_params_([t1.c1(:0)]), use_batch=true
1 - output([t1.c1], [t1.c2], [t1.c3]), filter([t1.c2 > 500]), rowset=256
access([t1.c1], [t1.c2], [t1.c3]), partitions(p0)
is_index_back=false, is_global_index=false, filter_before_indexback[false],
range_key([t1.c1]), range(MIN ; MAX)always true
2 - output([t2.c1], [t2.c2], [t2.c3]), filter(nil), rowset=256
access([GROUP_ID], [t2.c1], [t2.c2], [t2.c3]), partitions(p0)
is_index_back=false, is_global_index=false,
range_key([t2.c1]), range(MIN ; MAX),
range_cond([:0 = t2.c1])

±--------------------------------------------------------------------------------------------+
20 rows in set (0.164 sec)

obclient(root@t1)[test]> ≠SELECT COUNT() FROM lab22_t1 t1 WHERE t1.c2>500;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use near '≠SELECT COUNT(
) FROM lab22_t1 t1 WHERE t1.c2>500’ at line 1
obclient(root@t1)[test]> SELECT COUNT() FROM lab22_t1 t1 WHERE t1.c2>500;
±---------+
| COUNT(
) |
±---------+
| 1 |
±---------+
1 row in set (0.753 sec)

obclient(root@t1)[test]> SELECT c2, COUNT() FROM lab22_t1 t1 GROUP BY c2 ORDER BY 2 DESC;
±--------±---------+
| c2 | COUNT(
) |
±--------±---------+
| 1 | 333334 |
| 2 | 333333 |
| 0 | 333333 |
| 9999999 | 1 |
±--------±---------+
4 rows in set (0.409 sec)

obclient(root@t1)[test]> CALL DBMS_STATS.GATHER_TABLE_STATS(‘test’, ‘lab22_t1’, method_opt=>'FOR COLUMNS
‘> c2’);
Query OK, 0 rows affected (3.143 sec)

obclient(root@t1)[test]> SELECT * FROM oceanbase.DBA_TAB_STATISTICS WHERE OWNER=‘test’ AND TABLE_NAME LIKE
→ ‘lab22%’;
±------±-----------±---------------±-------------------±------------------±----------------------±------------±---------±-------±-------------±----------±----------±------------±--------------------------±--------------------±------------------±--------------------±--------------±---------------±--------------------±----------±------------±---------------------------±-------------±-----------±----------------±------------±------+
| OWNER | TABLE_NAME | PARTITION_NAME | PARTITION_POSITION | SUBPARTITION_NAME | SUBPARTITION_POSITION | OBJECT_TYPE | NUM_ROWS | BLOCKS | EMPTY_BLOCKS | AVG_SPACE | CHAIN_CNT | AVG_ROW_LEN | AVG_SPACE_FREELIST_BLOCKS | NUM_FREELIST_BLOCKS | AVG_CACHED_BLOCKS | AVG_CACHE_HIT_RATIO | IM_IMCU_COUNT | IM_BLOCK_COUNT | IM_STAT_UPDATE_TIME | SCAN_RATE | SAMPLE_SIZE | LAST_ANALYZED | GLOBAL_STATS | USER_STATS | STATTYPE_LOCKED | STALE_STATS | SCOPE |
±------±-----------±---------------±-------------------±------------------±----------------------±------------±---------±-------±-------------±----------±----------±------------±--------------------------±--------------------±------------------±--------------------±--------------±---------------±--------------------±----------±------------±---------------------------±-------------±-----------±----------------±------------±------+
| test | lab22_t1 | NULL | NULL | NULL | NULL | TABLE | 1000001 | NULL | NULL | NULL | NULL | 60 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1000001 | 2025-04-09 16:20:03.293861 | NO | NO | NULL | NO | NULL |
| test | lab22_t2 | NULL | NULL | NULL | NULL | TABLE | 1000001 | NULL | NULL | NULL | NULL | 60 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1000001 | 2025-04-09 16:18:48.826791 | NO | NO | NULL | NO | NULL |
±------±-----------±---------------±-------------------±------------------±----------------------±------------±---------±-------±-------------±----------±----------±------------±--------------------------±--------------------±------------------±--------------------±--------------±---------------±--------------------±----------±------------±---------------------------±-------------±-----------±----------------±------------±------+
2 rows in set (0.918 sec)

obclient(root@t1)[test]> SELECT * FROM oceanbase.DBA_TAB_COL_STATISTICS WHERE OWNER=‘test’ AND TABLE_NAME
→ LIKE ‘lab22%’;
±------±-----------±------------±-------------±----------±-----------±--------±----------±------------±---------------------------±------------±-------------±-----------±------±------------±----------±------+
| OWNER | TABLE_NAME | COLUMN_NAME | NUM_DISTINCT | LOW_VALUE | HIGH_VALUE | DENSITY | NUM_NULLS | NUM_BUCKETS | LAST_ANALYZED | SAMPLE_SIZE | GLOBAL_STATS | USER_STATS | NOTES | AVG_COL_LEN | HISTOGRAM | SCOPE |
±------±-----------±------------±-------------±----------±-----------±--------±----------±------------±---------------------------±------------±-------------±-----------±------±------------±----------±------+
| test | lab22_t1 | c1 | 930724 | 1 | 9999999 | 0 | 0 | 0 | 2025-04-09 16:18:44.570205 | 1000001 | NO | NO | NULL | 20 | NULL | NULL |
| test | lab22_t1 | c2 | 4 | 0 | 9999999 | 0 | 0 | 4 | 2025-04-09 16:20:03.293861 | 1000001 | NO | NO | NULL | 20 | FREQUENCY | NULL |
| test | lab22_t1 | c3 | 533 | 0 | 9999999 | 0 | 0 | 0 | 2025-04-09 16:18:44.570205 | 1000001 | NO | NO | NULL | 20 | NULL | NULL |
| test | lab22_t2 | c1 | 930724 | 1 | 9999999 | 0 | 0 | 0 | 2025-04-09 16:18:48.826791 | 1000001 | NO | NO | NULL | 20 | NULL | NULL |
| test | lab22_t2 | c2 | 4 | 0 | 9999999 | 0 | 0 | 0 | 2025-04-09 16:18:48.826791 | 1000001 | NO | NO | NULL | 20 | NULL | NULL |
| test | lab22_t2 | c3 | 533 | 0 | 9999999 | 0 | 0 | 0 | 2025-04-09 16:18:48.826791 | 1000001 | NO | NO | NULL | 20 | NULL | NULL |
±------±-----------±------------±-------------±----------±-----------±--------±----------±------------±---------------------------±------------±-------------±-----------±------±------------±----------±------+
6 rows in set (0.723 sec)

obclient(root@t1)[test]> SELECT * FROM oceanbase.DBA_TAB_HISTOGRAMS WHERE OWNER=‘test’ AND TABLE_NAME LIKE
→ ‘lab22%’;
±------±-----------±------------±----------------±---------------±----------------------±--------------------------±----------------------±------+
| OWNER | TABLE_NAME | COLUMN_NAME | ENDPOINT_NUMBER | ENDPOINT_VALUE | ENDPOINT_ACTUAL_VALUE | ENDPOINT_ACTUAL_VALUE_RAW | ENDPOINT_REPEAT_COUNT | SCOPE |
±------±-----------±------------±----------------±---------------±----------------------±--------------------------±----------------------±------+
| test | lab22_t1 | c2 | 333333 | NULL | 0 | 04053F0000 | 333333 | NULL |
| test | lab22_t1 | c2 | 666667 | NULL | 1 | 04053F0001 | 333334 | NULL |
| test | lab22_t1 | c2 | 1000000 | NULL | 2 | 04053F0002 | 333333 | NULL |
| test | lab22_t1 | c2 | 1000001 | NULL | 9999999 | 04053F00FFACE204 | 1 | NULL |
±------±-----------±------------±----------------±---------------±----------------------±--------------------------±----------------------±------+
4 rows in set (0.308 sec)

obclient(root@t1)[test]> EXPLAIN SELECT * FROM lab22_t1 t1, lab22_t2 t2 WHERE t1.c1=t2.c1 and t1.c2>500;
±--------------------------------------------------------------------------------------------+
| Query Plan |
±--------------------------------------------------------------------------------------------+
| ===================================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------------- |
| |0 |EXCHANGE IN REMOTE | |600000 |2506735 | |
| |1 |└─EXCHANGE OUT REMOTE| |600000 |1824464 | |
| |2 | └─MERGE JOIN | |600000 |290253 | |
| |3 | ├─TABLE FULL SCAN|t1 |600000 |122053 | |
| |4 | └─TABLE FULL SCAN|t2 |999534 |72040 | |
| ===================================================== |

Outputs & filters:
0 - output([t1.c1], [t1.c2], [t1.c3], [t2.c1], [t2.c2], [t2.c3]), filter(nil)
1 - output([t1.c1], [t1.c2], [t1.c3], [t2.c1], [t2.c2], [t2.c3]), filter(nil)
2 - output([t1.c1], [t1.c2], [t1.c3], [t2.c1], [t2.c2], [t2.c3]), filter(nil), rowset=256
equal_conds([t1.c1 = t2.c1]), other_conds(nil)
merge_directions([ASC])
3 - output([t1.c1], [t1.c2], [t1.c3]), filter([t1.c2 > 500]), rowset=256
access([t1.c1], [t1.c2], [t1.c3]), partitions(p0)
is_index_back=false, is_global_index=false, filter_before_indexback[false],
range_key([t1.c1]), range(MIN ; MAX)always true
4 - output([t2.c1], [t2.c2], [t2.c3]), filter(nil), rowset=256
access([t2.c1], [t2.c2], [t2.c3]), partitions(p0)
is_index_back=false, is_global_index=false,
range_key([t2.c1]), range(MIN ; MAX)always true

±--------------------------------------------------------------------------------------------+
24 rows in set (0.432 sec)

obclient(root@t1)[test]>

1 个赞