【测试环境】
【 OB
【 4.3.0】
在测试OLAP性能时,设置了列存,模拟了单表一亿行数据,跑数据分析的时候发现很慢(超过10分钟),同样的表结构和数据量在presto+hive的场景下只需要十几秒。
表结构:
–OB建表语句
CREATE TABLE test_olap
(
entity_id
varchar(256) DEFAULT NULL,
dim_name
varchar(256) DEFAULT NULL,
dim_value
varchar(256) DEFAULT NULL,
dim_day
varchar(256) DEFAULT NULL,
entity_type
varchar(256) DEFAULT NULL,
dim_code
varchar(256) DEFAULT NULL,
version
varchar(256) DEFAULT NULL
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = ‘zstd_1.3.8’ REPLICA_NUM = 3 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 WITH COLUMN GROUP (each column);
SQL:
select count(rowIndex) from ( SELECT row_number() OVER (PARTITION BY dim_name) AS rowIndex FROM test_olap WHERE dim_day = ‘20220901’ );