关于Oceanbase的查询性能

【 使用环境 】测试环境
【 OB or 其他组件 】
【 使用版本 】4.3.2
【问题描述】我们原先使用mysql数据库,数据量已经达到亿级别,查询能行比较差,想换成oceanbase,做了一个测试,生成了1000万级别的记录,两张表。
mysql中的两张表:
– mydb.All_Card definition

CREATE TABLE All_Card (
id int unsigned NOT NULL AUTO_INCREMENT COMMENT ‘id’,
Report_Code varchar(64) NOT NULL,
Code varchar(32) DEFAULT NULL,
Type varchar(8) DEFAULT NULL,
Seniority_Year varchar(8) DEFAULT NULL,
Seniority_Month varchar(8) DEFAULT NULL,
Exposure_Year varchar(8) DEFAULT NULL,
Exposure_Month varchar(8) DEFAULT NULL,
Check_Type varchar(8) DEFAULT NULL,
Body_Check_Type varchar(8) DEFAULT NULL,
Previous_Card_Id varchar(128) DEFAULT NULL,
Check_Time datetime DEFAULT NULL,
Write_Person varchar(50) DEFAULT NULL,
Write_Person_Tel varchar(20) DEFAULT NULL,
Write_Date datetime DEFAULT NULL,
write_unit varchar(128) DEFAULT NULL COMMENT ‘()’,
check_org_name varchar(128) DEFAULT NULL,
Report_Time datetime DEFAULT NULL,
Check_Result_Code varchar(4000) DEFAULT NULL,
Monitor_Type_Code varchar(2) DEFAULT NULL,
Protective_Equipment_Code varchar(4) DEFAULT NULL,
Report_Unit varchar(128) DEFAULT NULL,
Report_Person varchar(50) DEFAULT NULL,
Report_Person_Tel varchar(20) DEFAULT NULL,
Remark varchar(500) DEFAULT NULL,
Suggest varchar(4000) DEFAULT NULL,
Check_Doctor varchar(64) DEFAULT NULL,
Org_Code varchar(16) DEFAULT NULL,
Org_Name varchar(32) DEFAULT NULL,
Job_Code varchar(8) DEFAULT NULL,
Other_Job_Name varchar(20) DEFAULT NULL,
Radiation_Type varchar(16) DEFAULT NULL,
gmt_created datetime NOT NULL,
report_org_code varchar(9) NOT NULL,
gmt_updated datetime NOT NULL,
data_sources char(2) NOT NULL COMMENT ’ 1- 2-’,
audit_status int unsigned NOT NULL COMMENT ’ \r\n1 \r\n2 \r\n3 \r\n4 \r\n5 \r\n6 \r\n7 \r\n8 \r\n9 \r\n10 ‘,
exchange_tag int NOT NULL COMMENT ’ 11: 10: 2: 1: 0: -1: -2:’,
org_area_code varchar(9) DEFAULT NULL,
yrdw_area_code varchar(9) DEFAULT NULL,
ygdw_area_code varchar(9) DEFAULT NULL,
recent_country_error_msg varchar(300) DEFAULT NULL,
flag int DEFAULT NULL,
push_status tinyint(1) DEFAULT NULL,
PRIMARY KEY (id) USING BTREE,
UNIQUE KEY Report_Code (Report_Code) USING BTREE,
KEY Write_Date (Write_Date) USING BTREE,
KEY org_area_code (org_area_code) USING BTREE,
KEY yrdw_area_code (yrdw_area_code) USING BTREE,
KEY ygdw_area_code (ygdw_area_code) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=10310634 DEFAULT CHARSET=utf8mb3 ROW_FORMAT=COMPACT;
– mydb.All_Diagnosis definition

CREATE TABLE All_Diagnosis (
Id int NOT NULL AUTO_INCREMENT COMMENT ‘id’,
Report_Code varchar(50) DEFAULT NULL,
hazard_code varchar(11) DEFAULT NULL,
other_hazard_name varchar(128) DEFAULT NULL,
Conclusion varchar(2) DEFAULT NULL,
PRIMARY KEY (Id) USING BTREE,
KEY Report_Code (Report_Code) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=51554066 DEFAULT CHARSET=utf8mb3 ROW_FORMAT=COMPACT;

oceanbase中的两张表,类似的:
– zhejiang.all_card definition

CREATE TABLE all_card (
id int(10) unsigned NOT NULL COMMENT ‘id’,
Report_Code varchar(64) NOT NULL,
Code varchar(32) DEFAULT NULL,
Type varchar(8) DEFAULT NULL,
Seniority_Year varchar(8) DEFAULT NULL,
Seniority_Month varchar(8) DEFAULT NULL,
Exposure_Year varchar(8) DEFAULT NULL,
Exposure_Month varchar(8) DEFAULT NULL,
Check_Type varchar(8) DEFAULT NULL,
Body_Check_Type varchar(8) DEFAULT NULL,
Previous_Card_Id varchar(128) DEFAULT NULL,
Check_Time datetime DEFAULT NULL,
Write_Person varchar(50) DEFAULT NULL,
Write_Person_Tel varchar(20) DEFAULT NULL,
Write_Date datetime DEFAULT NULL,
write_unit varchar(128) DEFAULT NULL COMMENT ‘()’,
check_org_name varchar(128) DEFAULT NULL,
Report_Time datetime DEFAULT NULL,
Check_Result_Code varchar(4000) DEFAULT NULL,
Monitor_Type_Code varchar(2) DEFAULT NULL,
Protective_Equipment_Code varchar(4) DEFAULT NULL,
Report_Unit varchar(128) DEFAULT NULL,
Report_Person varchar(50) DEFAULT NULL,
Report_Person_Tel varchar(20) DEFAULT NULL,
Remark varchar(500) DEFAULT NULL,
Suggest varchar(4000) DEFAULT NULL,
Check_Doctor varchar(64) DEFAULT NULL,
Org_Code varchar(16) DEFAULT NULL,
Org_Name varchar(32) DEFAULT NULL,
Job_Code varchar(8) DEFAULT NULL,
Other_Job_Name varchar(20) DEFAULT NULL,
Radiation_Type varchar(16) DEFAULT NULL,
gmt_created datetime NOT NULL,
report_org_code varchar(9) NOT NULL,
gmt_updated datetime NOT NULL,
data_sources char(2) NOT NULL COMMENT ’ 1- 2-’,
audit_status int(10) unsigned NOT NULL,
exchange_tag int(11) NOT NULL COMMENT ’ 11: 10: 2: 1: 0: -1: -2:’,
org_area_code varchar(9) DEFAULT NULL,
yrdw_area_code varchar(9) DEFAULT NULL,
ygdw_area_code varchar(9) DEFAULT NULL,
recent_country_error_msg varchar(300) DEFAULT NULL,
flag int(11) DEFAULT NULL,
push_status tinyint(1) DEFAULT NULL,
PRIMARY KEY (id),
UNIQUE KEY Report_Code (Report_Code) BLOCK_SIZE 16384 LOCAL
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = ‘zstd_1.3.8’ REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 WITH COLUMN GROUP(all columns, each column);

– zhejiang.all_card definition

CREATE TABLE all_card (
id int(10) unsigned NOT NULL COMMENT ‘id’,
Report_Code varchar(64) NOT NULL,
Code varchar(32) DEFAULT NULL,
Type varchar(8) DEFAULT NULL,
Seniority_Year varchar(8) DEFAULT NULL,
Seniority_Month varchar(8) DEFAULT NULL,
Exposure_Year varchar(8) DEFAULT NULL,
Exposure_Month varchar(8) DEFAULT NULL,
Check_Type varchar(8) DEFAULT NULL,
Body_Check_Type varchar(8) DEFAULT NULL,
Previous_Card_Id varchar(128) DEFAULT NULL,
Check_Time datetime DEFAULT NULL,
Write_Person varchar(50) DEFAULT NULL,
Write_Person_Tel varchar(20) DEFAULT NULL,
Write_Date datetime DEFAULT NULL,
write_unit varchar(128) DEFAULT NULL COMMENT ‘()’,
check_org_name varchar(128) DEFAULT NULL,
Report_Time datetime DEFAULT NULL,
Check_Result_Code varchar(4000) DEFAULT NULL,
Monitor_Type_Code varchar(2) DEFAULT NULL,
Protective_Equipment_Code varchar(4) DEFAULT NULL,
Report_Unit varchar(128) DEFAULT NULL,
Report_Person varchar(50) DEFAULT NULL,
Report_Person_Tel varchar(20) DEFAULT NULL,
Remark varchar(500) DEFAULT NULL,
Suggest varchar(4000) DEFAULT NULL,
Check_Doctor varchar(64) DEFAULT NULL,
Org_Code varchar(16) DEFAULT NULL,
Org_Name varchar(32) DEFAULT NULL,
Job_Code varchar(8) DEFAULT NULL,
Other_Job_Name varchar(20) DEFAULT NULL,
Radiation_Type varchar(16) DEFAULT NULL,
gmt_created datetime NOT NULL,
report_org_code varchar(9) NOT NULL,
gmt_updated datetime NOT NULL,
data_sources char(2) NOT NULL COMMENT ’ 1- 2-’,
audit_status int(10) unsigned NOT NULL,
exchange_tag int(11) NOT NULL COMMENT ’ 11: 10: 2: 1: 0: -1: -2:’,
org_area_code varchar(9) DEFAULT NULL,
yrdw_area_code varchar(9) DEFAULT NULL,
ygdw_area_code varchar(9) DEFAULT NULL,
recent_country_error_msg varchar(300) DEFAULT NULL,
flag int(11) DEFAULT NULL,
push_status tinyint(1) DEFAULT NULL,
PRIMARY KEY (id),
UNIQUE KEY Report_Code (Report_Code) BLOCK_SIZE 16384 LOCAL
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = ‘zstd_1.3.8’ REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 WITH COLUMN GROUP(all columns, each column);

– zhejiang.all_diagnosis definition

CREATE TABLE all_diagnosis (
Id int(11) NOT NULL COMMENT ‘id’,
Report_Code varchar(50) DEFAULT NULL,
hazard_code varchar(11) DEFAULT NULL,
other_hazard_name varchar(128) DEFAULT NULL,
Conclusion varchar(2) DEFAULT NULL,
PRIMARY KEY (Id),
KEY Report_Code (Report_Code) BLOCK_SIZE 16384 LOCAL
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = ‘zstd_1.3.8’ REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 WITH COLUMN GROUP(all columns, each column);

注意,oceanbase的两张表都是行列混合的。

我首先在mysql中创建了1000万级别的数据(All_Card表,另一张表大概有4千万左右)。然后使用OMS同步到了oceanbase,同步之后,执行了如下语句:
ALTER SYSTEM MAJOR FREEZE;
CALL DBMS_STATS.GATHER_SCHEMA_STATS (‘zhejiang’, degree=>64,no_invalidate=>TRUE);
这个时候查询单张表格的count,性能提升明显,但是执行JOIN语句和GROUP语句的时候,
select a.hazard_code, count(a.Id) from all_diagnosis a inner join all_card b on a.Report_Code=b.Report_Code GROUP by a.hazard_code
性能极差,还不如mysql。我这边的结果是mysql上查询是93秒,而oceanbase上需要130秒左右。完全没有体现列存储的性能。我曾经在duckdb上测试过类似的,基本上在15秒左右。

请问这个可能是什么原因?是否存在提升的可能?

1 个赞

你发一下 explain extended sql 执行一下 把执行计划发一下 尽量发一个txt文本

可以用obdiag 收集这表信息
obdiag gather scene run --scene=observer.perf_sql --env “{db_connect=’-hxx -Pxx -uxx -pxx -Dxx’, trace_id=‘xx’}”
obdiag文档
https://www.oceanbase.com/docs/common-obdiag-cn-1000000001102504

2 个赞

首先可以看下执行计划的算子是否有COLUMN TABLE FULL SCAN确定是否走上了列存,其次就是看资源的负载情况,这种大查询最好就是加并行来提高资源利用效率,还有就是租户的资源规格、参数等也要根据情况适当调整下

2 个赞

多谢答复!我试试!

1 个赞

多谢答复。

1 个赞

小问题,把执行计划啥的发出来可以一起讨论下

1 个赞

oceanbase.txt (5.8 KB)
是这个东西吗?使用下面的语句得到的:
explain extended select a.hazard_code, count(a.Id) from all_diagnosis a inner join all_card b on a.Report_Code=b.Report_Code GROUP by a.hazard_code

1 个赞

这是个分布式执行计划看样子也没用上列存,b表的数据是广播到a表所在的节点,可以创建个表组把两表的leader集中在一起然后适当调整并行试试
还有一种优化办法就是将两张表都按照report_id来做分区,这样能走上Partition-Wise Join,感觉理论上能快点

1 个赞

多谢指点!