【布道师经验分享】如何根据 SQL ID 绑定子查询 SQL 的 hint

问题背景
试想这么一个场景:如下的sql语句,子查询union all的两个select语句,如果针对b=1的条件,想走索引,针对b=2的条件想全表扫,直接执行sql的话,加hint很简单,但是根据sql_id绑定计划的话,要如何写hint呢?

PS:使用sql_id绑定可以使得相同语句不同参数都使用相同的执行计划,而sql_text方式绑定则是固定参数

示例语句:
select * from test1 where test1.a in(
select a from test2 where test2.b = 1
union all
select a from test2 where test2.b = 2
);

示例表结构:
create table test1(a int,b int,key `b_idx`(b));
create table test2(a int,b int,key `b_idx`(b));

实验过程
这里查看默认的sql执行计划,可以看到test2表在两个条件下都是走索引的

MySQL [test]> explain select * from test1 where test1.a in(select a from test2 where test2.b = 1 union all select a from test2 where test2.b = 2);
============================================================
|ID|OPERATOR            |NAME        |EST.ROWS|EST.TIME(us)|
------------------------------------------------------------
|0 |SUBPLAN FILTER      |            |1       |16          |
|1 |├─TABLE FULL SCAN   |test1       |1       |3           |
|2 |└─UNION ALL         |            |2       |14          |
|3 |  ├─TABLE RANGE SCAN|test2(b_idx)|1       |7           |
|4 |  └─TABLE RANGE SCAN|test2(b_idx)|1       |7           |
============================================================
Outputs & filters:
-------------------------------------
  0 - output([test1.a(0x7f57d0a7b000)], [test1.b(0x7f57d0a7b690)]), filter(nil), rowset=16
      exec_params_(nil), onetime_exprs_([subquery(1)(0x7f57d0a20f50)(:0)]), init_plan_idxs_(nil), use_batch=false
  1 - output([test1.a(0x7f57d0a7b000)], [test1.b(0x7f57d0a7b690)]), filter([test1.a(0x7f57d0a7b000) = :0(0x7f57abc59ec0)]), rowset=16
      access([test1.a(0x7f57d0a7b000)], [test1.b(0x7f57d0a7b690)]), partitions(p0)
      is_index_back=false, is_global_index=false, filter_before_indexback[false], 
      range_key([test1.__pk_increment(0x7f57d0a7cee0)]), range(MIN ; MAX)always true
  2 - output([UNION([1])(0x7f57d0a7aad0)]), filter(nil), rowset=16
  3 - output([test2.a(0x7f57d0a66090)]), filter(nil), rowset=16
      access([test2.__pk_increment(0x7f57d0a7c0e0)], [test2.a(0x7f57d0a66090)]), partitions(p0)
      is_index_back=true, is_global_index=false, 
      range_key([test2.b(0x7f57d0a65590)], [test2.__pk_increment(0x7f57d0a7c0e0)]), range(1,MIN ; 1,MAX), 
      range_cond([test2.b(0x7f57d0a65590) = 1(0x7f57d0a64cd0)])
  4 - output([test2.a(0x7f57d0a7a5d0)]), filter(nil), rowset=16
      access([test2.__pk_increment(0x7f57d0a7c720)], [test2.a(0x7f57d0a7a5d0)]), partitions(p0)
      is_index_back=true, is_global_index=false, 
      range_key([test2.b(0x7f57d0a79ad0)], [test2.__pk_increment(0x7f57d0a7c720)]), range(2,MIN ; 2,MAX), 
      range_cond([test2.b(0x7f57d0a79ad0) = 2(0x7f57d0a79210)])

如果这时候想要子查询里面其中一个test2表走全表扫,另一个test2表仍走索引的话,直接执行的sql语句可以如下写法:

示例语句:
select * from test1 where test1.a in(
select a from test2 where test2.b = 1
union all
select /*+FULL(test2)*/ a from test2 where test2.b = 2
);

但是,根据sql_id绑定outline时就不知道该怎么写hint了,explain extended加了hint以后的执行计划,可以看到,test2表一张走b索引,一张是全表扫:

MySQL [test]> explain extended select * from test1 where test1.a in(select a from test2 where test2.b = 1 union all select /*+FULL(test2)*/ a from test2 where test2.b = 2);
============================================================
|ID|OPERATOR            |NAME        |EST.ROWS|EST.TIME(us)|
------------------------------------------------------------
|0 |SUBPLAN FILTER      |            |1       |12          |
|1 |├─TABLE FULL SCAN   |test1       |1       |3           |
|2 |└─UNION ALL         |            |2       |9           |
|3 |  ├─TABLE RANGE SCAN|test2(b_idx)|1       |7           |
|4 |  └─TABLE FULL SCAN |test2       |1       |3           |
============================================================
Outputs & filters:
-------------------------------------
  0 - output([test1.a(0x7f57e647b080)], [test1.b(0x7f57e647b710)]), filter(nil), rowset=16
      exec_params_(nil), onetime_exprs_([subquery(1)(0x7f57e64212e0)(:0)]), init_plan_idxs_(nil), use_batch=false
  1 - output([test1.a(0x7f57e647b080)], [test1.b(0x7f57e647b710)]), filter([test1.a(0x7f57e647b080) = :0(0x7f57cce54e10)]), rowset=16
      access([test1.a(0x7f57e647b080)], [test1.b(0x7f57e647b710)]), partitions(p0)
      is_index_back=false, is_global_index=false, filter_before_indexback[false], 
      range_key([test1.__pk_increment(0x7f57e647cf60)]), range(MIN ; MAX)always true
  2 - output([UNION([1])(0x7f57e647ab50)]), filter(nil), rowset=16
  3 - output([test2.a(0x7f57e6466090)]), filter(nil), rowset=16
      access([test2.__pk_increment(0x7f57e647c160)], [test2.a(0x7f57e6466090)]), partitions(p0)
      is_index_back=true, is_global_index=false, 
      range_key([test2.b(0x7f57e6465590)], [test2.__pk_increment(0x7f57e647c160)]), range(1,MIN ; 1,MAX), 
      range_cond([test2.b(0x7f57e6465590) = 1(0x7f57e6464cd0)])
  4 - output([test2.a(0x7f57e647a650)]), filter([test2.b(0x7f57e6479b50) = 2(0x7f57e6479290)]), rowset=16
      access([test2.b(0x7f57e6479b50)], [test2.a(0x7f57e647a650)]), partitions(p0)
      is_index_back=false, is_global_index=false, filter_before_indexback[false], 
      range_key([test2.__pk_increment(0x7f57e647c7a0)]), range(MIN ; MAX)always true
Used Hint:
-------------------------------------
  /*+
      
      FULL("test2")
  */
Qb name trace:
-------------------------------------
  stmt_id:0, stmt_type:T_EXPLAIN 
  stmt_id:1, SEL$1
  stmt_id:2, SET$1
  stmt_id:3, SEL$2
  stmt_id:4, SEL$3
Outline Data: 
-------------------------------------
  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$1" "test"."test1"@"SEL$1")
      INDEX(@"SEL$2" "test"."test2"@"SEL$2" "b_idx")
      FULL(@"SEL$3" "test"."test2"@"SEL$3")
      OPTIMIZER_FEATURES_ENABLE('4.3.3.0')
      END_OUTLINE_DATA
  */
Optimization Info:
-------------------------------------
  test1:
      table_rows:1
      physical_range_rows:1
      logical_range_rows:1
      index_back_rows:0
      output_rows:1
      table_dop:1
      dop_method:Table DOP
      avaiable_index_name:[b_idx, test1]
      pruned_index_name:[b_idx]
      stats info:[version=0, is_locked=0, is_expired=0]
      dynamic sampling level:0
      estimation method:[DEFAULT, STORAGE]
  test2:
      table_rows:1
      physical_range_rows:1
      logical_range_rows:1
      index_back_rows:1
      output_rows:1
      table_dop:1
      dop_method:Table DOP
      avaiable_index_name:[b_idx, test2]
      unstable_index_name:[test2]
      stats info:[version=0, is_locked=0, is_expired=0]
      dynamic sampling level:0
      estimation method:[DEFAULT, STORAGE]
  test2:
      table_rows:1
      physical_range_rows:1
      logical_range_rows:1
      index_back_rows:0
      output_rows:1
      table_dop:1
      dop_method:Table DOP
      avaiable_index_name:[b_idx, test2]
      pruned_index_name:[b_idx]
      stats info:[version=0, is_locked=0, is_expired=0]
      dynamic sampling level:0
      estimation method:[DEFAULT, STORAGE]
  Plan Type:
      LOCAL
  Note:
      Degree of Parallelisim is 1 because of table property

这时如果直接使用Used Hint里面的信息来绑定sql_id的执行计划,实际执行走不到想要的执行计划,如下:

1、explain extended中的Used Hint:
/*+
FULL(“test2”)
*/

2、create OUTLINE outline_test2 on “EA19E97132FCC08DCD96AF3D894A69F0” using hint /+FULL(“test2”)/

3、select outline_name,sql_id,outline_content from oceanbase.DBA_OB_OUTLINES where outline_name = ‘outline_test2’;

4、可以看到,这时候的执行计划没有按照预期的走,子查询的2个test2表都是走的索引扫描

MySQL [test]> select SQL_ID,PLAN_ID,QUERY_SQL,OUTLINE_ID,OUTLINE_DATA from oceanbase.gv$ob_plan_cache_plan_stat where sql_id='EA19E97132FCC08DCD96AF3D894A69F0'\G;
*************************** 1. row ***************************
      SQL_ID: EA19E97132FCC08DCD96AF3D894A69F0
     PLAN_ID: 18085
   QUERY_SQL: select * from test1 where test1.a in(select a from test2 where test2.b = 1 union all select a from test2 where test2.b = 2)
  OUTLINE_ID: 500124
OUTLINE_DATA: /*+BEGIN_OUTLINE_DATA FULL(@"SEL$1" "test"."test1"@"SEL$1") INDEX(@"SEL$2" "test"."test2"@"SEL$2" "b_idx") INDEX(@"SEL$3" "test"."test2"@"SEL$3" "b_idx") OPTIMIZER_FEATURES_ENABLE('4.3.3.0') END_OUTLINE_DATA*/
1 row in set (0.01 sec)

解决方案

1、参考outline使用sql_id绑定执行计划,https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001431842#3-title-%E4%BD%BF%E7%94%A8%20SQL_ID%20%E5%88%9B%E5%BB%BA%20Outline

2、explain extended绑定了hint的sql,查看执行计划的的OUTLINE_DATA信息,使用示例的hint直接绑定,示例:

| Outline Data:                                                                                                                         |
| -------------------------------------                                                                                                 |
|   /*+                                                                                                                                 |
|       BEGIN_OUTLINE_DATA                                                                                                              |
|       FULL(@"SEL$1" "test"."test1"@"SEL$1")                                                                                           |
|       INDEX(@"SEL$2" "test"."test2"@"SEL$2" "b_idx")                                                                                  |
|       FULL(@"SEL$3" "test"."test2"@"SEL$3")                                                                                           |
|       OPTIMIZER_FEATURES_ENABLE('4.3.3.0')                                                                                            |
|       END_OUTLINE_DATA                                                                                                                |
|   */

CREATE OUTLINE outline_test2 on “EA19E97132FCC08DCD96AF3D894A69F0” using hint /*+ BEGIN_OUTLINE_DATA INDEX(@“SEL$2” “test”.“test2”@“SEL$2” “b_idx”) FULL(@“SEL$3” “test”.“test2”@“SEL$3”) END_OUTLINE_DATA */;

3、查看执行的sql实际使用上了绑定的执行计划

MySQL [test]> select SQL_ID,PLAN_ID,QUERY_SQL,OUTLINE_ID,OUTLINE_DATA from oceanbase.gv$ob_plan_cache_plan_stat where sql_id='EA19E97132FCC08DCD96AF3D894A69F0'\G;
*************************** 1. row ***************************
      SQL_ID: EA19E97132FCC08DCD96AF3D894A69F0
     PLAN_ID: 17878
   QUERY_SQL: select * from test1 where test1.a in(select a from test2 where test2.b = 1 union all select a from test2 where test2.b = 2)
  OUTLINE_ID: 500123
OUTLINE_DATA: /*+BEGIN_OUTLINE_DATA FULL(@"SEL$1" "test"."test1"@"SEL$1") INDEX(@"SEL$2" "test"."test2"@"SEL$2" "b_idx") FULL(@"SEL$3" "test"."test2"@"SEL$3") OPTIMIZER_FEATURES_ENABLE('4.3.3.0') END_OUTLINE_DATA*/
1 row in set (0.01 sec)

1、outline绑定执行计划的过程可以参考官方文档:
https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001431842#3-title-使用%20SQL_ID%20创建%20Outline

2、如果遇到特别长的sql,explain extended数据也很多,可以尝试导出outline的方式绑定:
https://www.oceanbase.com/knowledge-base/oceanbase-database-20000001041?back=kb

另外,有时候hint会失效,如/*+index(t1 idx1)*/,如果发现加了hint以后,执行计划也没有变化,可以考虑如下两个原因:
1、在sql执行计划生成的过程中,有些复杂SQL会做一些SQL改写,这时候可以查看explain extended的outline data信息,是否有被改写。因此推荐直接拿outline data的信息,作为hint的写法,来调整执行计划
2、黑屏命令行mysql连接ob集群手动执行sql,没有加-c参数,导致执行的sql语句将hint给注释掉了