节点异常,报callback memory failed

【 使用环境 】生产环境
【 OB or 其他组件 】OB
【 使用版本 】4.2.2.0
【问题描述】其中一个节点出现异常,通过OCP进行重启能恢复,请问是什么原因?
【复现路径】上个月也出现过一次,没查到原因
【附件及日志】 [2024-11-27 08:57:29.417970] ERROR issue_dba_error (ob_log.cpp:1875) [1266892][T1023_ReplaySrv][T1023][Y0-0000000000000000-0-0] [lt=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_memtable_context.h”, line_no=288, info=“callback memory failed”)

[2024-11-27 08:57:29.417971] ERROR issue_dba_error (ob_log.cpp:1875) [1265631][T1017_ReplaySrv][T1017][Y0-0000000000000000-0-0] [lt=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_memtable_context.h”, line_no=288, info=“callback memory failed”)

[2024-11-27 08:57:29.417978] ERROR issue_dba_error (ob_log.cpp:1875) [1266892][T1023_ReplaySrv][T1023][Y0-0000000000000000-0-0] [lt=4][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_arena_object_pool.h”, line_no=69, info=“obj alloc error, no memory”)

[2024-11-27 08:57:29.417979] ERROR issue_dba_error (ob_log.cpp:1875) [1265631][T1017_ReplaySrv][T1017][Y0-0000000000000000-0-0] [lt=4][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_arena_object_pool.h”, line_no=69, info=“obj alloc error, no memory”)

[2024-11-27 08:57:29.417989] ERROR issue_dba_error (ob_log.cpp:1875) [1265632][T1017_ReplaySrv][T1017][Y0-0000000000000000-0-0] [lt=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_memtable_context.h”, line_no=288, info=“callback memory failed”)

[2024-11-27 08:57:29.417996] ERROR issue_dba_error (ob_log.cpp:1875) [1265632][T1017_ReplaySrv][T1017][Y0-0000000000000000-0-0] [lt=4][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_arena_object_pool.h”, line_no=69, info=“obj alloc error, no memory”)

[2024-11-27 08:57:29.418019] ERROR issue_dba_error (ob_log.cpp:1875) [1266831][T1023_ReplaySrv][T1023][Y0-0000000000000000-0-0] [lt=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_memtable_context.h”, line_no=288, info=“callback memory failed”)

[2024-11-27 08:57:29.418027] ERROR issue_dba_error (ob_log.cpp:1875) [1266831][T1023_ReplaySrv][T1023][Y0-0000000000000000-0-0] [lt=4][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_arena_object_pool.h”, line_no=69, info=“obj alloc error, no memory”)

[2024-11-27 08:57:29.418191] ERROR issue_dba_error (ob_log.cpp:1875) [1265579][T1017_ReplaySrv][T1017][Y0-0000000000000000-0-0] [lt=1][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_memtable_context.h”, line_no=288, info=“callback memory failed”)

[2024-11-27 08:57:29.418202] ERROR issue_dba_error (ob_log.cpp:1875) [1265579][T1017_ReplaySrv][T1017][Y0-0000000000000000-0-0] [lt=4][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_arena_object_pool.h”, line_no=69, info=“obj alloc error, no memory”)

[2024-11-27 08:57:29.418215] ERROR issue_dba_error (ob_log.cpp:1875) [1265631][T1017_ReplaySrv][T1017][Y0-0000000000000000-0-0] [lt=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_memtable_context.h”, line_no=288, info=“callback memory failed”)

[2024-11-27 08:57:29.418224] ERROR issue_dba_error (ob_log.cpp:1875) [1265633][T1017_ReplaySrv][T1017][Y0-0000000000000000-0-0] [lt=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_memtable_context.h”, line_no=288, info=“callback memory failed”)

[2024-11-27 08:57:29.418226] ERROR issue_dba_error (ob_log.cpp:1875) [1265631][T1017_ReplaySrv][T1017][Y0-0000000000000000-0-0] [lt=4][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_arena_object_pool.h”, line_no=69, info=“obj alloc error, no memory”)

[2024-11-27 08:57:29.418232] ERROR issue_dba_error (ob_log.cpp:1875) [1265633][T1017_ReplaySrv][T1017][Y0-0000000000000000-0-0] [lt=4][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_arena_object_pool.h”, line_no=69, info=“obj alloc error, no memory”)

[2024-11-27 08:57:29.418236] ERROR issue_dba_error (ob_log.cpp:1875) [1265632][T1017_ReplaySrv][T1017][Y0-0000000000000000-0-0] [lt=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_memtable_context.h”, line_no=288, info=“callback memory failed”)

[2024-11-27 08:57:29.418231] ERROR issue_dba_error (ob_log.cpp:1875) [1266893][T1023_ReplaySrv][T1023][Y0-0000000000000000-0-0] [lt=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_memtable_context.h”, line_no=288, info=“callback memory failed”)

[2024-11-27 08:57:29.418248] ERROR issue_dba_error (ob_log.cpp:1875) [1266893][T1023_ReplaySrv][T1023][Y0-0000000000000000-0-0] [lt=4][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_arena_object_pool.h”, line_no=69, info=“obj alloc error, no memory”)

[2024-11-27 08:57:29.418252] ERROR issue_dba_error (ob_log.cpp:1875) [1265632][T1017_ReplaySrv][T1017][Y0-0000000000000000-0-0] [lt=8][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_arena_object_pool.h”, line_no=69, info=“obj alloc error, no memory”)

[2024-11-27 08:57:29.418287] ERROR issue_dba_error (ob_log.cpp:1875) [1266831][T1023_ReplaySrv][T1023][Y0-0000000000000000-0-0] [lt=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_memtable_context.h”, line_no=288, info=“callback memory failed”)

[2024-11-27 08:57:29.418298] ERROR issue_dba_error (ob_log.cpp:1875) [1266831][T1023_ReplaySrv][T1023][Y0-0000000000000000-0-0] [lt=4][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_arena_object_pool.h”, line_no=69, info=“obj alloc error, no memory”)

[2024-11-27 08:57:29.418300] ERROR issue_dba_error (ob_log.cpp:1875) [1266892][T1023_ReplaySrv][T1023][Y0-0000000000000000-0-0] [lt=1][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file=“ob_memtable_context.h”, line_no=288, info=“callback memory failed”)

1 个赞

用诊断工具obdiag 分析一下发生故障期间30分钟的日志看看,
例如:

obdiag analyze log --from "2023-10-08 10:25:00" --to "2023-10-08 11:30:00" \
  --config obcluster.servers.nodes[0].ip=xx.xx.xx.1 \
  --config obcluster.servers.nodes[1].ip=xx.xx.xx.xx.2 \
  --config obcluster.servers.global.ssh_username=test \
  --config obcluster.servers.global.ssh_password=****** \
  --config obcluster.servers.global.home_path=/home/admin/oceanbase

obdiag文档:https://www.oceanbase.com/docs/common-obdiag-cn-1000000001491175

1 个赞

obdiag诊断发出来一份,故障期间日志保留的还有么。
通过关键词 ‘malloc_allocator.*tenant: 500’ 获取租户的内存元信息
通过关键词 ‘500 ctx_id= DEFAULT_CTX_ID’ 可以获取1008租户的DEFAULT_CTX_ID的内存元信息.
第二步的DEFAULT_CTX_ID模块替换为第一步查询出来内存占比高的模块

大概率是内存资源不足,需要扩内存

谢谢大家提供思路,服务器上找不到故障期间的日志了。。。

又出现了。。。
日志一直报这个
[2024-12-03 02:01:21.701212] ERROR detect_data_disk_io_failure_ (ob_failure_detector.cpp:395) [1815043][T1017_Occam][T1017][Y0-0000000000000000-0-0] [lt=12][errcode=-4392] disk is hung(msg=“data disk may be hung, add failure event”, data_disk_io_hang_event={type:PROCESS HANG, module:STORAGE, info:data disk io hang event, level:FATAL}, data_disk_error_start_ts=1733162481691765)

这个是磁盘问题吧,disk io hung,存储是什么类型的磁盘

增强型SSD云硬盘,这次重启也不行,前两次重启了恢复正常,我换个盘试试。。

换个盘后还有报错么