备份表报内存不足

SZJT · 2021 年12 月 14 日 18:26

【产品名称】OceanBase

【产品版本】3.1.1

【问题描述】

语句：

insert /*+ parallel(8)*/ into ifi_ipc_one

select t.id,t.date_produce,t.patenttype,t.pd,t.ipc,t.ti,t.ab_new abs_new,t.patentid from ifi_ipc t

where t.ipc is not null

and t.id not like ‘CN%’

and t.pd between 19600101 and 20291231;

commit;

#########################################

错误信息：

Error occurred during SQL script execution

原因：

SQL 错误 [4013] [HY001]: No memory or reach tenant memory limit

########################################

OB实际内存占用查询如图

SZJT · 2021 年12 月 14 日 18:39

内存转储阀值如图：

问了下研发，表是1.3Y的数据。按条件提取数据量在8千万左右

看了下官方是支持此类语句写法的

https://www.oceanbase.com/docs/oceanbase-database/oceanbase-database/V3.1.2/insert-statement

简葵 · 2021 年12 月 15 日 20:44

一般sql返回4013由以下几个原因触发：

1.内存泄漏，某个模块内存申请过大，其他模块申请不出内存了；

可以select * from __all_virtual_memory_info order by hold desc limit 20;看看有没有内存占用异常大的模块；

2.work area配比设的太小；

show variables like '%ob_sql_work_area_percentage%';

ob_sql_work_area_percentage默认值是5，即使用租户内存的5%，对于大sql可能不够，如果建议调成20后再执行sql看看；

3.误用了4013的错误码（误用的情况出现的概率比较小），一般通过排查日志判断；

4.sql执行过程中申请了过大的内存

sql执行过程中申请的内存sql执行完就释放了，这类问题只能通过日志排查；

通过日志排查ob内存问题的方法，可以参考https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/sgco3g

具体：

grep ret=-4013 observer.log找到traceid，再根据traceid看最开始报分配不出内存的地方；

ex

[2021-12-15 11:09:10.295485] WARN alloc (ob_allocator_v2.cpp:51) [2887][1913][YB42AC14027E-0005D2B3AD6A5BD8] [lt=3] [dc=0] oops, alloc failed, tenant_id=1001, ctx_id=29, ctx_name=QUERY_EXEC_CTX_ID, ctx_hold=78966161408, ctx_limit=9223372036854775807, tenant_hold=81604378624, tenant_limit=81604378624

从这条日志看租户的内存的确被占满了，而 __all_virtual_memory_info又没看到内存占用特别大的模块，很可能是sql执行过程中申请了过大的内存；

可以搜索 grep MEMORY observer.log查看特定租户下的内存占用；

[2021-12-15 11:09:00.349386] INFO [LIB] ob_tenant_ctx_allocator.cpp:148 [1894][2][Y0-0000000000000000] [lt=2] [dc=0]

[MEMORY] tenant_id= 1001 ctx_id= QUERY_EXEC_CTX_ID hold= 73,096,232,960 used= 69,972,802,928

[MEMORY] hold= 71,065,629,728 used= 69,972,773,232 count= 1,059,210 avg_used= 66,061 mod=QueryExecCtx

sql执行中占用了太多的内存，咨询得知报错的sql一次涉及8000w行，而ob开源版本当前没有对multi part insert没做落盘，下面这条涉及8000w行的sql申请了近70G的内存是符合当前的设计的；

insert /*+ parallel(8)*/ into ifi_ipc_one

select t.id,t.date_produce,t.patenttype,t.pd,t.ipc,t.ti,t.ab_new abs_new,t.patentid from ifi_ipc t

where t.ipc is not null

and t.id not like 'CN%'

and t.pd between 19600101 and 20291231;

可以通过pdml来规避问题，pdml会流式的做dml，只缓存少量数据

insert /+ enable_parallel_dml parallel(4)*/ into ifi_ipc_one

select t.id,t.date_produce,t.patenttype,t.pd,t.ipc,t.ti,t.ab_new abs_new,t.patentid from ifi_ipc t

where t.ipc is not null

and t.id not like 'CN%'

and t.pd between 19600101 and 20291231;

坤易 · 2021 年12 月 15 日 21:34

请问pdml即使流式的操作，但生成的memtable应该一样多吧？难道直接生成sstable?

简葵 · 2021 年12 月 16 日 10:40

流式操作的是查询的中间结果还没有写到数据库中，不涉及写到memtable和sstable；

这次的内存满是查询的中间结果把内存占满了，不是memtable满了，memtable内存超过一定的trigger会自动触发转储，正常情况下即使写入量非常大也不会满；

SZJT · 2021 年12 月 16 日 17:13

感谢工程师，问题已解决，临时表最终数据为9000多万、跑了一个多点使用语句为

insert /*+ enable_parallel_dml parallel(4)*/ into ifi_ipc_one

select t.id,t.date_produce,t.patenttype,t.pd,t.ipc,t.ti,t.ab_new abs_new,t.patentid from ifi_ipc t

where t.ipc is not null

and t.id not like 'CN%'

and t.pd between 19600101 and 20291231;

下个版本会修复此类问题吗，还是以后每次类似操作都需要pdml、今晚测试下看看在单机oracle上跑多久能结束

简葵 · 2021 年12 月 17 日 17:02

这个相当于使用限制，mysql一次insert这么多数据也不行，ob已经提供了绕过这个限制的方法；短期不会修，