看着这个统计值 system_memory才72G 你那个统计看着有问题 以后可以用这个语句进行统计
通过关键词 ‘malloc_allocator.*tenant: 500’ 可以获取租户的内存元信息(limit、hold、cache_hold),-A可以显示更多的内存元信息的细节,包括该租户下各CTX的内存占用,以及租户线程的PM信息。
grep ob日志
你发下observer.log ,然后分析下内存发下结果
obdiag gather scene run --scene=observer.memory
https://www.oceanbase.com/docs/common-obdiag-cn-1000000001102519
那个统计的是分配内存 你这个统计使用内存 这个统计更准确 你先按照楼上发的 统计一下信息 发一下日志
通过关键词 ‘500 ctx_id= DEFAULT_CTX_ID’ 可以获取租户的DEFAULT_CTX_ID的内存元信息
这个截图能麻烦截取全一下嘛
grep ‘malloc_allocator.*tenant: 500’ ob日志 -A 20
看着没有很大占用的模块
抓错服务器了,抱歉。如下 这个是对着
[root@mxt-datanode04 log]# grep 'malloc_allocator.*tenant: 500' observer.log -A 20
[2024-07-26 17:32:29.418322] INFO [LIB] operator() (ob_malloc_allocator.cpp:567) [12687][MemDumpTimer][T0][Y0-0000000000000000-0-0] [lt=14] [MEMORY] tenant: 500, limit: 9,223,372,036,854,775,807 hold: 548,614,2
75,072 rpc_hold: 0 cache_hold: 0 cache_used: 0 cache_item_count: 0
[MEMORY] ctx_id= DEFAULT_CTX_ID hold_bytes=541,468,712,960 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= GLIBC hold_bytes= 83,886,080 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= CO_STACK hold_bytes= 255,852,544 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= LIBEASY hold_bytes= 18,874,368 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= LOGGER_CTX_ID hold_bytes= 12,582,912 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= RPC_CTX_ID hold_bytes= 134,217,728 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= PKT_NIO hold_bytes= 164,724,736 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= SCHEMA_SERVICE hold_bytes= 4,400,631,808 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= UNEXPECTED_IN_500 hold_bytes= 2,074,791,936 limit= 9,223,372,036,854,775,807
[2024-07-26 17:32:29.418403] INFO [STORAGE] get_kept_multi_version_start (ob_tablet.cpp:4037) [21623][T1042_SSTableGC][T1042][YB420A260E1A-000617A97E1EC608-0-0] [lt=0] get multi version start(ret=0, ls_id={id:1
}, tablet_id={id:192}, multi_version_start=1721930507983200336, min_reserved_snapshot=1721984536683478279, min_medium_snapshot=9223372036854775807, ls_min_reserved_snapshot=1721984536683478279, last_major_snapsh
ot_version=1721930401822372817)
[2024-07-26 17:32:29.418564] INFO [STORAGE] get_kept_multi_version_start (ob_tablet.cpp:4037) [21623][T1042_SSTableGC][T1042][YB420A260E1A-000617A97E1EC608-0-0] [lt=0] get multi version start(ret=0, ls_id={id:1
}, tablet_id={id:50110}, multi_version_start=1721930508486451810, min_reserved_snapshot=1721984536683478279, min_medium_snapshot=9223372036854775807, ls_min_reserved_snapshot=1721984536683478279, last_major_snap
shot_version=1721930401822372817)
[2024-07-26 17:32:29.418724] INFO [STORAGE] get_kept_multi_version_start (ob_tablet.cpp:4037) [21623][T1042_SSTableGC][T1042][YB420A260E1A-000617A97E1EC608-0-0] [lt=1] get multi version start(ret=0, ls_id={id:1
}, tablet_id={id:101019}, multi_version_start=1721930508486451810, min_reserved_snapshot=1721984536683478279, min_medium_snapshot=9223372036854775807, ls_min_reserved_snapshot=1721984536683478279, last_major_sna
pshot_version=1721930401822372817)
[2024-07-26 17:32:29.418596] INFO [LIB] print_usage (ob_tenant_ctx_allocator.cpp:176) [12687][MemDumpTimer][T0][Y0-0000000000000000-0-0] [lt=14]
[MEMORY] tenant_id= 500 ctx_id= DEFAULT_CTX_ID hold= 541,468,712,960 used= 524,069,771,040 limit= 9,223,372,036,854,775,807
[MEMORY] idle_size= 0 free_size= 0
[MEMORY] wash_related_chunks= 0 washed_blocks= 0 washed_size= 0
[MEMORY] hold= 519,945,083,904 used= 515,752,950,016 count= 64,988,039 avg_used= 7,936 block_cnt= 64,988,039 chunk_cnt= 257,015 mod=SeArray
[MEMORY] hold= 1,745,096,880 used= 1,744,830,568 count= 14 avg_used= 124,630,754 block_cnt= 14 chunk_cnt= 14 mod=CACHE_MAP_BKT
[MEMORY] hold= 245,782,464 used= 243,746,616 count= 2,238 avg_used= 108,912 block_cnt= 1,541 chunk_cnt= 743 mod=DeadLock
[MEMORY] hold= 201,347,072 used= 201,326,616 count= 1 avg_used= 201,326,616 block_cnt= 1 chunk_cnt= 1 mod=CACHE_MAP_LOCK
--
[2024-07-26 17:32:39.494866] INFO [LIB] operator() (ob_malloc_allocator.cpp:567) [12687][MemDumpTimer][T0][Y0-0000000000000000-0-0] [lt=9] [MEMORY] tenant: 500, limit: 9,223,372,036,854,775,807 hold: 548,618,48
9,856 rpc_hold: 0 cache_hold: 0 cache_used: 0 cache_item_count: 0
[MEMORY] ctx_id= DEFAULT_CTX_ID hold_bytes=541,479,239,680 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= GLIBC hold_bytes= 83,886,080 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= CO_STACK hold_bytes= 255,852,544 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= LIBEASY hold_bytes= 18,874,368 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= LOGGER_CTX_ID hold_bytes= 12,582,912 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= RPC_CTX_ID hold_bytes= 134,217,728 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= PKT_NIO hold_bytes= 164,724,736 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= SCHEMA_SERVICE hold_bytes= 4,400,631,808 limit= 9,223,372,036,854,775,807
[MEMORY] ctx_id= UNEXPECTED_IN_500 hold_bytes= 2,074,791,936 limit= 9,223,372,036,854,775,807
[2024-07-26 17:32:39.495139] INFO [LIB] print_usage (ob_tenant_ctx_allocator.cpp:176) [12687][MemDumpTimer][T0][Y0-0000000000000000-0-0] [lt=15]
[MEMORY] tenant_id= 500 ctx_id= DEFAULT_CTX_ID hold= 541,472,927,744 used= 524,065,456,224 limit= 9,223,372,036,854,775,807
[MEMORY] idle_size= 0 free_size= 0
[MEMORY] wash_related_chunks= 0 washed_blocks= 0 washed_size= 0
[MEMORY] hold= 519,943,664,320 used= 515,751,577,088 count= 64,988,043 avg_used= 7,936 block_cnt= 64,988,043 chunk_cnt= 257,016 mod=SeArray
[MEMORY] hold= 1,745,096,880 used= 1,744,830,568 count= 14 avg_used= 124,630,754 block_cnt= 14 chunk_cnt= 14 mod=CACHE_MAP_BKT
[MEMORY] hold= 245,782,464 used= 243,746,616 count= 2,238 avg_used= 108,912 block_cnt= 1,541 chunk_cnt= 743 mod=DeadLock
[MEMORY] hold= 201,347,072 used= 201,326,616 count= 1 avg_used= 201,326,616 block_cnt= 1 chunk_cnt= 1 mod=CACHE_MAP_LOCK
[MEMORY] hold= 167,510,016 used= 166,630,752 count= 2,556 avg_used= 65,192 block_cnt= 2,556 chunk_cnt= 824 mod=LatchStat
[MEMORY] hold= 140,288,000 used= 123,628,800 count= 3,425 avg_used= 36,096 block_cnt= 3,425 chunk_cnt= 573 mod=InneSqlConnPool
[MEMORY] hold= 117,337,776 used= 115,751,232 count= 5,100 avg_used= 22,696 block_cnt= 2,900 chunk_cnt= 446 mod=TenantCtxAlloca
[root@mxt-datanode04 log]# grep "malloc_allocator.*tenant: 500" observer.log
[2024-07-26 17:34:20.275087] INFO [LIB] operator() (ob_malloc_allocator.cpp:567) [12687][MemDumpTimer][T0][Y0-0000000000000000-0-0] [lt=9] [MEMORY] tenant: 500, limit: 9,223,372,036,854,775,807 hold: 548,612,17
7,920 rpc_hold: 0 cache_hold: 0 cache_used: 0 cache_item_count: 0
[2024-07-26 17:34:30.352462] INFO [LIB] operator() (ob_malloc_allocator.cpp:567) [12687][MemDumpTimer][T0][Y0-0000000000000000-0-0] [lt=11] [MEMORY] tenant: 500, limit: 9,223,372,036,854,775,807 hold: 548,614,2
75,072 rpc_hold: 0 cache_hold: 0 cache_used: 0 cache_item_count: 0
grep ‘500 ctx_id= DEFAULT_CTX_ID’ ob日志 -C 1 -A 10
你把整个observer.log的日志发一下吧
从上面的分析看DEFAULT_CTX_ID占用内存较大,是不符合预期的,麻烦发下observer.log ,然后分析下内存发下结果,我联系相关同学进一步分析下
obdiag gather scene run --scene=observer.memory
https://www.oceanbase.com/docs/common-obdiag-cn-1000000001102519
请按照如下步骤贴下结果:
查看模块内存:
select * from gv$ob_memory where tenant_id=500 order by used desc limit 20;
获取堆栈(back_trace)
select * from __all_virtual_malloc_sample_info where svr_ip=‘xx.xx.xx.xx’ and mod_name=‘第一部获取的结果,你这里可能是SeArray’ order by alloc_bytes desc limit 3
打印堆栈:
addr2line -pCfe bin/observer $back_trace
mysql> select * from gv$ob_memory where tenant_id=500 order by used desc limit 20;
+-----------+-------------+----------+-------------------+-----------------+----------+--------------+--------------+
| TENANT_ID | SVR_IP | SVR_PORT | CTX_NAME | MOD_NAME | COUNT | HOLD | USED |
+-----------+-------------+----------+-------------------+-----------------+----------+--------------+--------------+
| 500 | 10.38.14.26 | 2882 | DEFAULT_CTX_ID | SeArray | 64988357 | 519946029504 | 515753926144 |
| 500 | 10.38.14.26 | 2882 | SCHEMA_SERVICE | SchemaSysCache | 276280 | 2089811344 | 2069858772 |
| 500 | 10.38.14.26 | 2882 | DEFAULT_CTX_ID | CACHE_MAP_BKT | 14 | 1745096880 | 1744830568 |
| 500 | 10.38.14.26 | 2882 | SCHEMA_SERVICE | TenantSchemMgr | 1243 | 1738635840 | 1737752320 |
| 500 | 10.38.14.26 | 2882 | UNEXPECTED_IN_500 | OccamThreadPool | 16874 | 684713280 | 608444096 |
| 500 | 10.38.14.26 | 2882 | UNEXPECTED_IN_500 | TenantConfig | 231 | 556351488 | 551677896 |
| 500 | 10.38.14.26 | 2882 | UNEXPECTED_IN_500 | FixeSizeBlocAll | 7 | 268546944 | 268444576 |
| 500 | 10.38.14.26 | 2882 | UNEXPECTED_IN_500 | CommonNetwork | 13883 | 251424320 | 249569975 |
| 500 | 10.38.14.26 | 2882 | CO_STACK | CoStack | 483 | 249274368 | 248810688 |
| 500 | 10.38.14.26 | 2882 | DEFAULT_CTX_ID | DeadLock | 3200 | 246041264 | 243931320 |
| 500 | 10.38.14.26 | 2882 | UNEXPECTED_IN_500 | di_tenant_cache | 233 | 240500736 | 239907984 |
| 500 | 10.38.14.26 | 2882 | DEFAULT_CTX_ID | CACHE_MAP_LOCK | 1 | 201347072 | 201326616 |
| 500 | 10.38.14.26 | 2882 | DEFAULT_CTX_ID | LatchStat | 2556 | 167510016 | 166630752 |
| 500 | 10.38.14.26 | 2882 | PKT_NIO | DEFAULT | 106 | 127127360 | 126383856 |
| 500 | 10.38.14.26 | 2882 | DEFAULT_CTX_ID | InneSqlConnPool | 3425 | 140288000 | 123628800 |
| 500 | 10.38.14.26 | 2882 | DEFAULT_CTX_ID | TenantCtxAlloca | 5109 | 117370032 | 115782912 |
| 500 | 10.38.14.26 | 2882 | RPC_CTX_ID | RpcDefault | 6932 | 113754112 | 112686592 |
| 500 | 10.38.14.26 | 2882 | DEFAULT_CTX_ID | ArchiveLSMap | 103 | 108847104 | 108006624 |
| 500 | 10.38.14.26 | 2882 | DEFAULT_CTX_ID | RecordContext | 103 | 108847104 | 108006624 |
| 500 | 10.38.14.26 | 2882 | DEFAULT_CTX_ID | ArcPersistMap | 103 | 108847104 | 108006624 |
+-----------+-------------+----------+-------------------+-----------------+----------+--------------+--------------+
20 rows in set (0.03 sec)
第二部分:
mysql> select * from __all_virtual_malloc_sample_info where svr_ip='10.38.14.26' and mod_name='SeArray' order by alloc_bytes desc limit 3 \G
*************************** 1. row ***************************
svr_ip: 10.38.14.26
svr_port: 2882
tenant_id: 500
ctx_id: 0
mod_name: SeArray
back_trace: 0x49ccb85 0x49ca51e 0xddb95c1 0xdda0844 0x4d14846 0x4d13b79 0x4d130a4 0xeb1beea 0x4bdee61 0x11a55721 0x11a51caf 0x7f62391e4ea5 0x7f6238f0d96d 0x0 0x0 0x0
ctx_name: DEFAULT_CTX_ID
alloc_count: 175321
alloc_bytes: 1391347456
*************************** 2. row ***************************
svr_ip: 10.38.14.26
svr_port: 2882
tenant_id: 500
ctx_id: 0
mod_name: SeArray
back_trace: 0x49ccb85 0x49ca51e 0xddb95c1 0xdda0844 0x4ccbe20 0x4be45fc 0x4be403b 0x4be3031 0x4bdee61 0x11a55721 0x11a51caf 0x7f62391e4ea5 0x7f6238f0d96d 0x0 0x0 0x0
ctx_name: DEFAULT_CTX_ID
alloc_count: 69944
alloc_bytes: 555075584
*************************** 3. row ***************************
svr_ip: 10.38.14.26
svr_port: 2882
tenant_id: 500
ctx_id: 0
mod_name: SeArray
back_trace: 0x49ccb85 0x49ca51e 0xddb95c1 0xdda0844 0x4d14846 0x4d13b79 0xeacac6e 0xeacd250 0xeac8cdf 0xeb052aa 0xeb01617 0xf19feca 0x4ca6153 0x11a55721 0x11a51caf 0x7f62391e4ea5
ctx_name: DEFAULT_CTX_ID
alloc_count: 6230
alloc_bytes: 49441280
3 rows in set (0.02 sec)
mysql>
第三部分:
[root@mxt-datanode04 oceanbase]# addr2line -pCfe bin/observer 0x49ccb85 0x49ca51e 0xddb95c1 0xdda0844 0x4d14846 0x4d13b79 0x4d130a4 0xeb1beea 0x4bdee61 0x11a55721 0x11a51caf 0x7f62391e4ea5 0x7f6238f0d96d 0x0 0x
0 0x0
void* oceanbase::lib::ObTenantCtxAllocator::common_alloc<oceanbase::lib::ObjectMgr>(long, oceanbase::lib::ObMemAttr const&, oceanbase::lib::ObTenantCtxAllocator&, oceanbase::lib::ObjectMgr&) at ??:?
oceanbase::common::ModulePageAllocator::alloc(long, oceanbase::lib::ObMemAttr const&) at 0_cxx.cxx:?
oceanbase::common::ObSEArrayImpl<oceanbase::compaction::ObMediumCompactionInfo*, 1l, oceanbase::common::ModulePageAllocator, false>::reserve(long) at 0_cxx.cxx:?
oceanbase::storage::ObTabletMdsData::load_medium_info_list(oceanbase::common::ObIAllocator&, oceanbase::storage::ObTabletComplexAddr<oceanbase::storage::ObTabletDumpedMediumInfo> const&, oceanbase::storage::ObTa
bletDumpedMediumInfo*&) at ??:?
oceanbase::storage::ObTabletMediumInfoReader::init(oceanbase::common::ObArenaAllocator&) at ??:?
oceanbase::storage::ObTablet::get_kept_multi_version_start(oceanbase::storage::ObLS&, oceanbase::storage::ObTablet const&, long&) at ??:?
oceanbase::storage::ObTenantTabletScheduler::try_remove_old_table(oceanbase::storage::ObLS&) at 0_cxx.cxx:?
oceanbase::storage::ObTenantTabletScheduler::SSTableGCTask::runTimerTask() at ??:?
oceanbase::common::ObTimer::run1() at ??:?
oceanbase::lib::Threads::run(long) at ??:?
oceanbase::lib::Thread::__th_start(void*) at 0_cxx.cxx:?
?? ??:0
?? ??:0
?? ??:0
?? ??:0
?? ??:0
[root@mxt-datanode04 oceanbase]#
[root@mxt-datanode04 oceanbase]# addr2line -pCfe bin/observer 0x49ccb85 0x49ca51e 0xddb95c1 0xdda0844 0x4ccbe20 0x4be45fc 0x4be403b 0x4be3031 0x4bdee61 0x11a55721 0x11a51caf 0x7f62391e4ea5 0x7f6238f0d96d 0x0 0x0
0x0
void* oceanbase::lib::ObTenantCtxAllocator::common_alloc<oceanbase::lib::ObjectMgr>(long, oceanbase::lib::ObMemAttr const&, oceanbase::lib::ObTenantCtxAllocator&, oceanbase::lib::ObjectMgr&) at ??:?
oceanbase::common::ModulePageAllocator::alloc(long, oceanbase::lib::ObMemAttr const&) at 0_cxx.cxx:?
oceanbase::common::ObSEArrayImpl<oceanbase::compaction::ObMediumCompactionInfo*, 1l, oceanbase::common::ModulePageAllocator, false>::reserve(long) at 0_cxx.cxx:?
oceanbase::storage::ObTabletMdsData::load_medium_info_list(oceanbase::common::ObIAllocator&, oceanbase::storage::ObTabletComplexAddr<oceanbase::storage::ObTabletDumpedMediumInfo> const&, oceanbase::storage::ObTa
bletDumpedMediumInfo*&) at ??:?
oceanbase::storage::ObTablet::read_medium_info_list(oceanbase::common::ObArenaAllocator&, oceanbase::compaction::ObMediumCompactionInfoList const*&) const at ??:?
oceanbase::compaction::ObPartitionMergePolicy::get_boundary_snapshot_version(oceanbase::storage::ObTablet const&, long&, long&, bool, bool) at 0_cxx.cxx:?
oceanbase::compaction::ObPartitionMergePolicy::get_minor_merge_tables(oceanbase::storage::ObGetMergeTablesParam const&, oceanbase::storage::ObLS&, oceanbase::storage::ObTablet const&, oceanbase::storage::ObGetMe
rgeTablesResult&) at 0_cxx.cxx:?
oceanbase::storage::ObTenantTabletScheduler::MergeLoopTask::runTimerTask() at ??:?
oceanbase::common::ObTimer::run1() at ??:?
oceanbase::lib::Threads::run(long) at ??:?
oceanbase::lib::Thread::__th_start(void*) at 0_cxx.cxx:?
?? ??:0
?? ??:0
?? ??:0
?? ??:0
?? ??:0
[root@mxt-datanode04 oceanbase]# addr2line -pCfe bin/observer 0x49ccb85 0x49ca51e 0xddb95c1 0xdda0844 0x4d14846 0x4d13b79 0xeacac6e 0xeacd250 0xeac8cdf 0xeb052aa 0xeb01617 0xf19feca 0x4ca6153 0x11a55721 0x11a51
caf 0x7f62391e4ea5
void* oceanbase::lib::ObTenantCtxAllocator::common_alloc<oceanbase::lib::ObjectMgr>(long, oceanbase::lib::ObMemAttr const&, oceanbase::lib::ObTenantCtxAllocator&, oceanbase::lib::ObjectMgr&) at ??:?
oceanbase::common::ModulePageAllocator::alloc(long, oceanbase::lib::ObMemAttr const&) at 0_cxx.cxx:?
oceanbase::common::ObSEArrayImpl<oceanbase::compaction::ObMediumCompactionInfo*, 1l, oceanbase::common::ModulePageAllocator, false>::reserve(long) at 0_cxx.cxx:?
oceanbase::storage::ObTabletMdsData::load_medium_info_list(oceanbase::common::ObIAllocator&, oceanbase::storage::ObTabletComplexAddr<oceanbase::storage::ObTabletDumpedMediumInfo> const&, oceanbase::storage::ObTa
bletDumpedMediumInfo*&) at ??:?
oceanbase::storage::ObTabletMediumInfoReader::init(oceanbase::common::ObArenaAllocator&) at ??:?
oceanbase::storage::ObTablet::get_kept_multi_version_start(oceanbase::storage::ObLS&, oceanbase::storage::ObTablet const&, long&) at ??:?
oceanbase::compaction::ObPartitionMergePolicy::get_multi_version_start(oceanbase::storage::ObMergeType, oceanbase::storage::ObLS&, oceanbase::storage::ObTablet const&, oceanbase::common::ObVersionRange&) at 0_cx
x.cxx:?
oceanbase::compaction::ObPartitionMergePolicy::deal_with_minor_result(oceanbase::storage::ObMergeType const&, oceanbase::storage::ObLS&, oceanbase::storage::ObTablet const&, oceanbase::storage::ObGetMergeTablesR
esult&) at 0_cxx.cxx:?
oceanbase::compaction::ObPartitionMergePolicy::get_mini_merge_tables(oceanbase::storage::ObGetMergeTablesParam const&, oceanbase::storage::ObLS&, oceanbase::storage::ObTablet const&, oceanbase::storage::ObGetMer
geTablesResult&) at 0_cxx.cxx:?
oceanbase::compaction::ObTabletMiniPrepareTask::inner_init_ctx(oceanbase::compaction::ObTabletMergeCtx&, bool&) at ??:?
oceanbase::compaction::ObTabletMergePrepareTask::process() at ??:?
oceanbase::share::ObITask::do_work() at 0_cxx.cxx:?
oceanbase::share::ObTenantDagWorker::run1() at ??:?
oceanbase::lib::Threads::run(long) at ??:?
oceanbase::lib::Thread::__th_start(void*) at 0_cxx.cxx:?
?? ??:0
我沟通内核相关同学分析下
目前认为可能是OB存储层逻辑一个函数调用失败导致的内存泄漏,需要500租户内存上涨时的日志确认,麻烦发下包含这个时段的observer.log
原因分析: 数组未调用析构引发内存泄漏。
临时解决方案:重启
根本解决方案:内核记录issue解决可能的内存泄漏问题