OceanBase将一个1000多w的大表修改成分区表后,存储空间增长了3G

【 使用环境 】测试环境
【 OB or 其他组件 】ob4.0-ce
【 使用版本 】ob4.0-ce
【问题描述】OceanBase空间使用21G,但是将一个1000多w的表进行分区后,空间使用24G,为什么将表分区后,占用空间增加了3G
【复现路径】问题出现前后相关操作
【问题现象及影响】

【附件】
大表分区前,空间占用请:

将大表分区后,空间占用情况:

改了多少个分区

49个分区,一个月一个分区

分区之前查过select * from __all_virtual_tablet_sstable_macro_info 里面的original_size和occupy_size信息吗?

麻烦你查看一下当前数据分布:
select /*+ parallel(30) */ svr_ip, table_type, sum(size) / 1024 / 1024 / 1024 as size_G from __all_virtual_table_mgr where table_type > 0 group by 1,2 order by 3 desc;

table_type等于10是major
table_type等于11是minor
看起来是major和minor占大头,但现在不知道分区前是不是也一样的情况。你能把你分区的分区的方式贴一下吗?

ALTER TABLE tbl_test partition by range(UNIX_TIMESTAMP(create_time))
(partition P202001 values less than (1580486400),
partition P202002 values less than (1582992000),
partition P202003 values less than (1585670400),
partition P202004 values less than (1588262400),
partition P202005 values less than (1590940800),
partition P202006 values less than (1593532800),
partition P202007 values less than (1596211200),
partition P202008 values less than (1598889600),
partition P202009 values less than (1601481600),
partition P202010 values less than (1604160000),
partition P202011 values less than (1606752000),
partition P202012 values less than (1609430400),
partition P202101 values less than (1612108800),
partition P202102 values less than (1614528000),
partition P202103 values less than (1617206400),
partition P202104 values less than (1619798400),
partition P202105 values less than (1622476800),
partition P202106 values less than (1625068800),
partition P202107 values less than (1627747200),
partition P202108 values less than (1630425600),
partition P202109 values less than (1633017600),
partition P202110 values less than (1635696000),
partition P202111 values less than (1638288000),
partition P202112 values less than (1640966400),
partition P202201 values less than (1643644800),
partition P202202 values less than (1646064000),
partition P202203 values less than (1648742400),
partition P202204 values less than (1651334400),
partition P202205 values less than (1654012800),
partition P202206 values less than (1656604800),
partition P202207 values less than (1659283200),
partition P202208 values less than (1661961600),
partition P202209 values less than (1664553600),
partition P202210 values less than (1667232000),
partition P202211 values less than (1669824000),
partition P202212 values less than (1672502400),
partition P202301 values less than (1675180800),
partition P202302 values less than (1677600000),
partition P202303 values less than (1680278400),
partition P202304 values less than (1682870400),
partition P202305 values less than (1685548800),
partition P202306 values less than (1688140800),
partition P202307 values less than (1690819200),
partition P202308 values less than (1693497600),
partition P202309 values less than (1696089600),
partition P202310 values less than (1698768000),
partition P202311 values less than (1701360000),
partition P202312 values less than (1704038400),
partition P202401 values less than (1706716800));

麻烦再执行下,看下数据情况
select /+ parallel(30)/ svr_ip, sum(original_size)/1024/1024/1024 as original_size_G, sum(occupy_size)/1024/1024/1024 as occupy_size_G from __all_virtual_tablet_sstable_macro_info group by 1 order by 2 desc;

分区之后有做过合并吗?

分区之后数据从整体划分为局部,压缩率可能会有变化,如果期间没有做过合并等数据重整操作时,当前情况可能在分区之后数据压缩率降低,导致占用的block的数量变多。现在只有一个分区之后的表,可能不太好分析,如有条件麻烦复现下分区前后的表现:
select /+ parallel(30) / svr_ip, sum(original_size)/1024/1024/1024 as original_size_G, sum(occupy_size)/1024/1024/1024 as occupy_size_G from __all_virtual_tablet_sstable_macro_info group by 1 order by 2 desc;

分区之后做合并了

合并之后会把内存数据刷到磁盘,统计出来的disk stat 空间会变大,看起来符合预期的

数据落盘,数据量不变,只是分区,磁盘空间不应该增大这么多吧

那说明内存中还有数据的,__all_virtual_disk_stat统计的是磁盘数据的信息,统计出来的use_sized变大也合乎情理

但是在这个表分区之前,也做合并了,当时磁盘使用21G,分区没有其他数据写入,数据没有增加,只是做了分区,磁盘不应该增加这么多吧

那分区之前的合并,有通过查表:select * from CDB_OB_MAJOR_COMPACTION 确认已经做完了吗?另外确认下分区之后,合并之前,有没有数据的写入。

已经确认合并完了,没有其他写入

那麻烦你现在再做一次合并:alter system major freeze tenant = all,合并结束后再看__all_virtual_disk_stat的情况吧