SQL执行顺序问题 SQL耗时异常

【 使用环境 】生产环境
【 OB or 其他组件 】OB
【 使用版本 】4.2.2.0
【问题描述】同一个sql在MySQL和OB中执行顺序不通 SQL迁移后耗时过大
该SQL在MySQL中先进行b表查出结果 然后再去特大表中n中寻找数据 为什么在OB中会直接对特大表n进行全表扫描 ,表n中new为longtext字段 无法建设索引 只有INFOCODE有单列索引

执行SQL

SELECT n.new
FROM info_an_notice_title_special b 
JOIN info_an_content n ON b.INFOCODE = n.INFOCODE
WHERE b.SECURITYCODE = '832171'  AND b.NOTICETYPE = '50004';

MYSQL执行计划

1	SIMPLE	b		const	unique_key,idx_info_an_noticetitle_f10_isn	unique_key	636	const,const	1	100.00	
1	SIMPLE	n		range	idx_INFOCODE	idx_INFOCODE	603		1	100.00	Using where; Using index

OB执行计划

=================================================================
|ID|OPERATOR              |NAME           |EST.ROWS|EST.TIME(us)|
-----------------------------------------------------------------
|0 |EXCHANGE IN REMOTE    |               |1       |824458      |
|1 |└─EXCHANGE OUT REMOTE |               |1       |824456      |
|2 |  └─NESTED-LOOP JOIN  |               |1       |824450      |
|3 |    ├─TABLE RANGE SCAN|b(unique_key)  |1       |9           |
|4 |    └─TABLE FULL SCAN |n(idx_INFOCODE)|11712799|353698      |
=================================================================
Outputs & filters:
-------------------------------------
  0 - output([b.TZT_ID], [b.INFOCODE], [b.SECURITYCODE], [b.NOTICEDATE], [b.ENDDATE], [b.NOTICETITLE], [b.NOTICETYPE], [b.SOURCENAME], [b.TZT_INSERTTIME],
       [b.TZT_UPDATETIME]), filter(nil)
  1 - output([b.TZT_ID], [b.INFOCODE], [b.SECURITYCODE], [b.NOTICEDATE], [b.ENDDATE], [b.NOTICETITLE], [b.NOTICETYPE], [b.SOURCENAME], [b.TZT_INSERTTIME],
       [b.TZT_UPDATETIME]), filter(nil)
  2 - output([b.TZT_ID], [b.INFOCODE], [b.SECURITYCODE], [b.NOTICEDATE], [b.ENDDATE], [b.NOTICETITLE], [b.NOTICETYPE], [b.SOURCENAME], [b.TZT_INSERTTIME],
       [b.TZT_UPDATETIME]), filter(nil), rowset=256
      conds([b.INFOCODE = cast(n.INFOCODE, VARCHAR(1048576))]), nl_params_(nil), use_batch=false
  3 - output([b.TZT_ID], [b.INFOCODE], [b.SECURITYCODE], [b.NOTICETYPE], [b.NOTICEDATE], [b.ENDDATE], [b.NOTICETITLE], [b.SOURCENAME], [b.TZT_INSERTTIME],
       [b.TZT_UPDATETIME]), filter(nil), rowset=256
      access([b.TZT_ID], [b.INFOCODE], [b.SECURITYCODE], [b.NOTICETYPE], [b.NOTICEDATE], [b.ENDDATE], [b.NOTICETITLE], [b.SOURCENAME], [b.TZT_INSERTTIME],
       [b.TZT_UPDATETIME]), partitions(p0)
      is_index_back=true, is_global_index=false, 
      range_key([b.SECURITYCODE], [b.NOTICETYPE], [b.shadow_pk_0]), range(832171,50004,MIN ; 832171,50004,MAX), 
      range_cond([b.SECURITYCODE = cast('832171', VARCHAR(1048576))], [b.NOTICETYPE = cast('50004', VARCHAR(1048576))])
  4 - output([n.INFOCODE]), filter(nil), rowset=256
      access([n.INFOCODE]), partitions(p0)
      is_index_back=false, is_global_index=false, 
      range_key([n.INFOCODE], [n.EID]), range(MIN,MIN ; MAX,MAX)always true
1 个赞

表结构如下

CREATE TABLE `info_an_notice_title_special` (
  `ID` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'ID',
  `INFOCODE` varchar(200) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '编码',
  `SECURITYCODE` varchar(200) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '代码',
  `NOTICEDATE` datetime DEFAULT NULL COMMENT '日期',
  `ENDDATE` datetime DEFAULT NULL COMMENT '日期',
  `NOTICETITLE` text COLLATE utf8mb4_bin DEFAULT NULL COMMENT '标题',
  `NOTICETYPE` varchar(10) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '类型',
  `SOURCENAME` varchar(800) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '来源',
  `INSERTTIME` datetime(4) NOT NULL DEFAULT CURRENT_TIMESTAMP(4) COMMENT '插入时间',
  `UPDATETIME` datetime(4) NOT NULL DEFAULT CURRENT_TIMESTAMP(4) ON UPDATE CURRENT_TIMESTAMP(4) COMMENT '更新时间',
  PRIMARY KEY (`ID`),
  UNIQUE KEY `unique_key` (`SECURITYCODE`, `NOTICETYPE`) BLOCK_SIZE 16384 LOCAL,
  KEY `infocode` (`INFOCODE`) BLOCK_SIZE 16384 LOCAL,
  KEY `a1` (`SECURITYCODE`, `NOTICETYPE`, `NOTICEDATE`) BLOCK_SIZE 16384 LOCAL,
  KEY `idx_info_an_noticetitle_f10_isn` (`SECURITYCODE`, `NOTICETYPE`) BLOCK_SIZE 16384 LOCAL
) AUTO_INCREMENT = 291134079 AUTO_INCREMENT_MODE = 'ORDER' DEFAULT CHARSET = utf8mb4 COLLATE = utf8mb4_bin ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 3 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 COMMENT = '公告';

CREATE TABLE `info_an_content` (
  `EID` bigint(20) NOT NULL,
  `ESEQID` bigint(20) NOT NULL,
  `EITIME` datetime NOT NULL,
  `EUTIME` datetime NOT NULL,
  `EISDEL` char(1) DEFAULT '0',
  `INFOCODE` varchar(200) DEFAULT NULL COMMENT '编码',
  `INFOBODYCONTENT` longtext DEFAULT NULL COMMENT '正文',
  `SOURCE_CODE` int(4) DEFAULT '1' COMMENT '',
  `UPDATETIME` datetime(4) NOT NULL DEFAULT CURRENT_TIMESTAMP(4) ON UPDATE CURRENT_TIMESTAMP(4),
  PRIMARY KEY (`EID`),
  KEY `idx_info_an_content_eid_eseqid` (`EID`, `ESEQID`) BLOCK_SIZE 16384 LOCAL,
  KEY `idx_info_an_content_eutime` (`EUTIME`) BLOCK_SIZE 16384 LOCAL,
  KEY `idx_INFOCODE` (`INFOCODE`) BLOCK_SIZE 16384 LOCAL
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 3 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 COMMENT = '内容';

两个表的字段类型不对应,一个是utf8mb4_bin,一个是utf8mb4_general_ci

3 个赞

除了表的字符集保持一致,关联字段的COLLATE也需要保持一致

2 个赞

SELECT n.new
FROM info_an_notice_title_special b
JOIN info_an_content n ON convert(b.INFOCODE using utf8mb4) = n.INFOCODE
WHERE b.SECURITYCODE = ‘832171’ AND b.NOTICETYPE = ‘50004’;

obclient [test]> explain SELECT /*+leading(b n) use_nl(n)*/ * FROM info_an_notice_title_special b  JOIN info_an_content n ON convert(b.INFOCODE using utf8mb4) = n.INFOCODE WHERE b.SECURITYCODE = '832171'  AND b.NOTICETYPE = '50004';
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| =========================================================================                                                                                                                 |
| |ID|OPERATOR                      |NAME           |EST.ROWS|EST.TIME(us)|                                                                                                                 |
| -------------------------------------------------------------------------                                                                                                                 |
| |0 |NESTED-LOOP JOIN              |               |1       |29          |                                                                                                                 |
| |1 |├─TABLE RANGE SCAN            |b(unique_key)  |1       |8           |                                                                                                                 |
| |2 |└─DISTRIBUTED TABLE RANGE SCAN|n(idx_INFOCODE)|1       |21          |                                                                                                                 |
| =========================================================================                                                                                                                 |
| Outputs & filters:                                                                                                                                                                        |
| -------------------------------------                                                                                                                                                     |
|   0 - output([b.ID], [b.INFOCODE], [b.SECURITYCODE], [b.NOTICEDATE], [b.ENDDATE], [b.NOTICETITLE], [b.NOTICETYPE], [b.SOURCENAME], [b.INSERTTIME], [b.UPDATETIME],                        |
|        [n.EID], [n.ESEQID], [n.EITIME], [n.EUTIME], [n.EISDEL], [n.INFOCODE], [n.INFOBODYCONTENT], [n.SOURCE_CODE], [n.UPDATETIME]), filter(nil), rowset=8                                |
|       conds(nil), nl_params_([b.INFOCODE(:0)]), use_batch=true                                                                                                                            |
|   1 - output([b.ID], [b.INFOCODE], [b.SECURITYCODE], [b.NOTICETYPE], [b.NOTICEDATE], [b.ENDDATE], [b.NOTICETITLE], [b.SOURCENAME], [b.INSERTTIME], [b.UPDATETIME]), filter(nil), rowset=8 |
|       access([b.ID], [b.INFOCODE], [b.SECURITYCODE], [b.NOTICETYPE], [b.NOTICEDATE], [b.ENDDATE], [b.NOTICETITLE], [b.SOURCENAME], [b.INSERTTIME], [b.UPDATETIME]), partitions(p0)        |
|       is_index_back=true, is_global_index=false,                                                                                                                                          |
|       range_key([b.SECURITYCODE], [b.NOTICETYPE], [b.shadow_pk_0]), range(832171,50004,MIN ; 832171,50004,MAX),                                                                           |
|       range_cond([b.SECURITYCODE = cast('832171', VARCHAR(1048576))], [b.NOTICETYPE = cast('50004', VARCHAR(1048576))])                                                                   |
|   2 - output([n.EID], [n.INFOCODE], [n.ESEQID], [n.EITIME], [n.EUTIME], [n.EISDEL], [n.INFOBODYCONTENT], [n.SOURCE_CODE], [n.UPDATETIME]), filter(nil), rowset=8                          |
|       access([GROUP_ID], [n.EID], [n.INFOCODE], [n.ESEQID], [n.EITIME], [n.EUTIME], [n.EISDEL], [n.INFOBODYCONTENT], [n.SOURCE_CODE], [n.UPDATETIME]), partitions(p0)                     |
|       is_index_back=true, is_global_index=false,                                                                                                                                          |
|       range_key([n.INFOCODE], [n.EID]), range(MIN ; MAX),                                                                                                                                 |
|       range_cond([convert(cast(:0, VARCHAR(1048576)), 'utf8mb4') = n.INFOCODE])                                                                                                           |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 个赞

学习了!!!

这个 convert 是让在 join 之前,手动显式把 b 表数据的 collation 转成和 n 表数据的 collation 一致,这样就能用上 n 表的索引了。

convert 虽然只能转 charset,不过会顺便把 collation 转成 charset 默认的 collation。碰巧 utf8mb4 的默认 collation 是 general_ci,所以把 collation 给转成想要的了,哈哈~

2 个赞

这个问题问了下 gpt,发现还有一种方式,可以把 join 条件改成 b.INFOCODE COLLATE utf8mb4_general_ci = n.INFOCODE。

explain SELECT /*+leading(b n) use_nl(n)*/ * FROM info_an_notice_title_special b  JOIN info_an_content n ON b.INFOCODE COLLATE utf8mb4_general_ci = n.INFOCODE WHERE b.SECURITYCODE = '832171'  AND b.NOTICETYPE = '50004';
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                                         |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| =========================================================================                                                                                                          |
| |ID|OPERATOR                      |NAME           |EST.ROWS|EST.TIME(us)|                                                                                                          |
| -------------------------------------------------------------------------                                                                                                          |
| |0 |NESTED-LOOP JOIN              |               |1       |29          |                                                                                                          |
| |1 |├─TABLE RANGE SCAN            |b(unique_key)  |1       |8           |                                                                                                          |
| |2 |└─DISTRIBUTED TABLE RANGE SCAN|n(idx_INFOCODE)|1       |21          |                                                                                                          |
| =========================================================================                                                                                                          |
| Outputs & filters:                                                                                                                                                                 |
| -------------------------------------                                                                                                                                              |
|   0 - output([b.ID], [b.INFOCODE], [b.SECURITYCODE], [b.NOTICEDATE], [b.ENDDATE], [b.NOTICETITLE], [b.NOTICETYPE], [b.SOURCENAME], [b.INSERTTIME], [b.UPDATETIME],                 |
|        [n.EID], [n.ESEQID], [n.EITIME], [n.EUTIME], [n.EISDEL], [n.INFOCODE], [n.INFOBODYCONTENT], [n.SOURCE_CODE], [n.UPDATETIME]), filter(nil)                                   |
|       conds(nil), nl_params_([b.INFOCODE(:0)]), use_batch=true                                                                                                                     |
|   1 - output([b.ID], [b.INFOCODE], [b.SECURITYCODE], [b.NOTICETYPE], [b.NOTICEDATE], [b.ENDDATE], [b.NOTICETITLE], [b.SOURCENAME], [b.INSERTTIME], [b.UPDATETIME]), filter(nil)    |
|       access([b.ID], [b.INFOCODE], [b.SECURITYCODE], [b.NOTICETYPE], [b.NOTICEDATE], [b.ENDDATE], [b.NOTICETITLE], [b.SOURCENAME], [b.INSERTTIME], [b.UPDATETIME]), partitions(p0) |
|       is_index_back=true, is_global_index=false,                                                                                                                                   |
|       range_key([b.SECURITYCODE], [b.NOTICETYPE], [b.shadow_pk_0]), range(832171,50004,MIN ; 832171,50004,MAX),                                                                    |
|       range_cond([b.SECURITYCODE = cast('832171', VARCHAR(1048576))], [b.NOTICETYPE = cast('50004', VARCHAR(1048576))])                                                            |
|   2 - output([n.EID], [n.INFOCODE], [n.ESEQID], [n.EITIME], [n.EUTIME], [n.EISDEL], [n.INFOBODYCONTENT], [n.SOURCE_CODE], [n.UPDATETIME]), filter(nil)                             |
|       access([GROUP_ID], [n.EID], [n.INFOCODE], [n.ESEQID], [n.EITIME], [n.EUTIME], [n.EISDEL], [n.INFOBODYCONTENT], [n.SOURCE_CODE], [n.UPDATETIME]), partitions(p0)              |
|       is_index_back=true, is_global_index=false,                                                                                                                                   |
|       range_key([n.INFOCODE], [n.EID]), range(MIN ; MAX),                                                                                                                          |
|       range_cond([set_collation(:0, 45) = n.INFOCODE])                                                                                                                             |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
22 rows in set (0.004 sec)

絮语给出的第一种方式,chatgpt 也回答出来了,现在各种 gpt 真是牛逼呀~

2 个赞

这个问题都被你研究透了,还能举一反三 :+1: :+1: :+1:

哈哈,因为上次其实没解决清楚,这个问题之前用 convert 显式改 charset 只是凑巧能解决这个问题而已,感觉并不是通用的方法。

OceanBase 的 convert 只能转 charset,然后附带把 collation 转成对应 charset 默认的 collation。碰巧 utf8mb4 的默认 collation 是 general_ci,直接把 collation 给转成想要的了。万一想转的 collation 不是对应 charset 默认的 collation,用上次的方法就不成了,这两天一直耿耿于怀~

今天问了下 chatgpt,就发现这个通用方法了……

1 个赞

@论坛小助手 两个 collation 不同的 varchar 列,在进行关联时,能否用上索引?

根据文档库中的信息,当两个不同 collationvarchar 列进行关联时,可能会导致索引无法被有效使用。具体来说,如果两个列的字符集或排序规则(collation)不匹配,数据库引擎在执行连接操作时需要进行隐式的字符集转换,这通常会导致索引失效,从而影响查询性能。

针对您提到的情况,info_an_notice_title_special 表中的 INFOCODE 列和 info_an_content 表中的 INFOCODE 列如果具有不同的 collation,则可能导致 info_an_content 表上的索引无法被有效使用,进而导致全表扫描。

解决方案

  1. 统一字符集和排序规则

    • 确保两个表中的 INFOCODE 列具有相同的字符集和排序规则。可以通过以下 SQL 语句检查和修改字符集和排序规则:

      -- 查看当前字符集和排序规则
      SHOW CREATE TABLE info_an_notice_title_special;
      SHOW CREATE TABLE info_an_content;
      
      -- 修改字符集和排序规则
      ALTER TABLE info_an_notice_title_special MODIFY INFOCODE VARCHAR(1048576) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
      ALTER TABLE info_an_content MODIFY INFOCODE VARCHAR(1048576) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
      
  2. 显式转换字符集

    • 如果无法修改表结构,可以在查询中显式转换字符集,但这通常会影响性能。例如:

      SELECT n.new
      FROM info_an_notice_title_special b 
      JOIN info_an_content n ON b.INFOCODE = CONVERT(n.INFOCODE USING utf8mb4)
      WHERE b.SECURITYCODE = '832171' AND b.NOTICETYPE = '50004';
      
  3. 创建覆盖索引

    • 如果 INFOCODE 列上有索引,但查询仍然慢,可以考虑创建覆盖索引,确保查询所需的所有列都在索引中。例如:

      CREATE INDEX idx_info_an_content_covering ON info_an_content (INFOCODE, new);
      

总结

  • 字符集和排序规则不匹配:可能导致索引失效,引发全表扫描。
  • 解决方案:统一字符集和排序规则,或显式转换字符集,或创建覆盖索引。

希望这些信息对您有所帮助!如果还有其他问题,请随时联系。 [1], [3], [1]

更多信息请查看:

  1. 创建索引
  2. OBKV-HBase 数据模式设计
  3. 索引设计
  4. 列存 FAQ
  5. 分区索引

哈哈,论坛答疑机器人的这个回答还可以哟~

1 个赞

抽空儿写了篇博客《SQL 调优实践记录 —— 分析一个在不同 collation 的字符串连接时,无法利用索引的问题》,简单记录了一下这个问题~

2 个赞

相当好