一、问题背景
该问题发现于OCP告警,对业务没有产生实际影响,由于是首次发现,所以需要收集信息进行分析,处理该告警问题。
1. 问题描述
告警事件详情:
告警概述:alarm_template_id=0:ob_cluster=xxbank-1719306245:host=x.x.x.127 OBServer 非预期的内部错误
告警详情:[OBServer 非预期的内部错误] 集群:xxbank,主机:x.x.x.127,
日志类型:observer,日志文件:/home/admin/oceanbase/log/observer.log,
日志级别:ERROR,关键字=Unexpected internal error happen,错误码=4388,
日志详情=[2025-06-24 19:36:48.668022] ERROR issue_dba_error (ob_log.cpp:1875) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=25][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file="ob_jit_allocator.cpp", line_no=122, info="allocate jit memory failed")。
二、问题分析
1. 收集信息
1.1 基础信息
[root@xobs127 log]# free -h
total used free shared buff/cache available
Mem: 1.0Ti 141Gi 719Gi 1.8Gi 160Gi 869Gi
Swap: 0B 0B 0B
[root@xobs127 log]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 511G 0 511G 0% /dev
tmpfs 511G 0 511G 0% /dev/shm
tmpfs 511G 1.9G 509G 1% /run
tmpfs 511G 0 511G 0% /sys/fs/cgroup
/dev/mapper/klas-root 90G 9.1G 81G 11% /
tmpfs 511G 1.7M 511G 1% /tmp
/dev/sdb2 1014M 341M 674M 34% /boot
/dev/sdb1 599M 6.5M 593M 2% /boot/efi
/dev/mapper/klas-backup 300G 161G 140G 54% /home
/dev/mapper/oceanbase1-ob_log 3.0T 1001G 1.9T 35% /data/log1
/dev/mapper/oceanbase1-ob_data 15T 2.1T 12T 15% /data/1
tmpfs 103G 0 103G 0% /run/user/500
xx.xx.xx.87:/NL_OBbak_v01 4.8T 1.6T 3.2T 33% /obbackup
tmpfs 103G 0 103G 0% /run/user/993
tmpfs 103G 0 103G 0% /run/user/0
[root@xobs127 log]# lscpu
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 1
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 4
Vendor ID: HiSilicon
Model: 0
Model name: Kunpeng-920
Stepping: 0x1
CPU max MHz: 2600.0000
CPU min MHz: 200.0000
BogoMIPS: 200.00
L1d cache: 8 MiB
L1i cache: 8 MiB
L2 cache: 64 MiB
L3 cache: 128 MiB
NUMA node0 CPU(s): 0-31
NUMA node1 CPU(s): 32-63
NUMA node2 CPU(s): 64-95
NUMA node3 CPU(s): 96-127
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
小结:
- 集群架构为4F1A,集群版本为OBServer-V4.2.1bp10
- 数据库服务器架构为ARM,机器资源充足。
- 问题租户为Oracle模式,租户primary_zone=zone1;zone2;zone3;zone4。
1.2 日志信息
1.2.1 通过OCP告警展示的trace_id过滤,获取trace_log
[admin@xobs127 ~]$ grep 'YB420A81037F-000634EDC6CB992A-0-0' /home/admin/oceanbase/log/observer.log.20250624*
/home/admin/oceanbase/log/observer.log.20250624194453892:[2025-06-24 19:36:48.648325] INFO [PL] get_pl_function (ob_pl.cpp:2190) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=5] get pl function from plan cache failed(ret=-5138, pc_ctx.key_={db_id:500009, key_id:4646, namespace:4, name:""}, stmt_id=4646, sql=BEGIN ?:=DBMS_LOB.GETLENGTH( ?);END;, params=[{obj:{"NULL":"NULL"}, accuracy:{length:-1, precision:-1, scale:-1}, flag:0, raw_text_pos:-1, raw_text_len:-1, param_meta:{type:"NULL", collation:"binary", coercibility:"IGNORABLE"}}, {obj:{"LONGTEXT":"outrow", collation:"utf8mb4_bin", coercibility:"IMPLICIT"}, accuracy:{length:-1, precision:-1, scale:-1}, flag:0, raw_text_pos:-1, raw_text_len:-1, param_meta:{type:"LONGTEXT", collation:"utf8mb4_bin", coercibility:"IMPLICIT"}}])
/home/admin/oceanbase/log/observer.log.20250624194453892:[2025-06-24 19:36:48.648349] INFO [PL] get_pl_function (ob_pl.cpp:2208) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=23] get pl function by sql failed, will ignore this error(ret=-5138, pc_ctx.key_={db_id:500009, key_id:18446744073709551615, namespace:4, name:"BEGIN ?:=DBMS_LOB.GETLENGTH( ?);END;"}, stmt_id=4646, sql=BEGIN ?:=DBMS_LOB.GETLENGTH( ?);END;, params=[{obj:{"NULL":"NULL"}, accuracy:{length:-1, precision:-1, scale:-1}, flag:0, raw_text_pos:-1, raw_text_len:-1, param_meta:{type:"NULL", collation:"binary", coercibility:"IGNORABLE"}}, {obj:{"LONGTEXT":"outrow", collation:"utf8mb4_bin", coercibility:"IMPLICIT"}, accuracy:{length:-1, precision:-1, scale:-1}, flag:0, raw_text_pos:-1, raw_text_len:-1, param_meta:{type:"LONGTEXT", collation:"utf8mb4_bin", coercibility:"IMPLICIT"}}])
/home/admin/oceanbase/log/observer.log.20250624194453892:[2025-06-24 19:36:48.648368] INFO [PL] get_pl_function (ob_pl.cpp:2245) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=17] get pl function by sql failed, will ignore this error(ret=-5138, pc_ctx.key_={db_id:500009, key_id:4646, namespace:4, name:""}, stmt_id=4646, sql=BEGIN ?:=DBMS_LOB.GETLENGTH( ?);END;, params=[{obj:{"NULL":"NULL"}, accuracy:{length:-1, precision:-1, scale:-1}, flag:0, raw_text_pos:-1, raw_text_len:-1, param_meta:{type:"NULL", collation:"binary", coercibility:"IGNORABLE"}}, {obj:{"LONGTEXT":"outrow", collation:"utf8mb4_bin", coercibility:"IMPLICIT"}, accuracy:{length:-1, precision:-1, scale:-1}, flag:0, raw_text_pos:-1, raw_text_len:-1, param_meta:{type:"LONGTEXT", collation:"utf8mb4_bin", coercibility:"IMPLICIT"}}])
/home/admin/oceanbase/log/observer.log.20250624194453892:[2025-06-24 19:36:48.652319] INFO [PL] compile_module (ob_llvm_helper.cpp:618) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=16] ================Optimized LLVM Module================
/home/admin/oceanbase/log/observer.log.20250624194453892:[2025-06-24 19:36:48.654975] INFO [PL] dump_module (ob_llvm_helper.cpp:637) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=16] Dump LLVM Compile Module!
/home/admin/oceanbase/log/observer.log.20250624194453892:[2025-06-24 19:36:48.668022] ERROR issue_dba_error (ob_log.cpp:1875) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=25][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4013, file="ob_jit_allocator.cpp", line_no=122, info="allocate jit memory failed")
/home/admin/oceanbase/log/observer.log.20250624194453892:[2025-06-24 19:36:48.668040] EDIAG [SQL.CG] allocate_mapped_memory (ob_jit_allocator.cpp:122) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=18][errcode=-4013] allocate jit memory failed(addr=0xfff6ffff0000, num_bytes=1005, page_size=65536, num_pages=1) BACKTRACE:0x11f13520 0xbb54190 0xba78a08 0xba78584 0xba784b4 0xba782e4 0x119dfa58 0x119df784 0x119e0c74 0x59b34e4 0x59c5494 0x59ae3f4 0x119de484 0x119df2d0 0x119dbcec 0x119dba3c 0x119bea48 0x8354100 0x83523fc 0x820e1c0 0x820a7d4 0x8210b1c 0xaf880dc 0xad582cc 0xba9d218 0xb8954f8 0xb8927c8 0xba467cc 0x93577e0 0x9358548 0x935e6b8 0x9361cd4 0xb83566c 0xb81fa38 0x9017ac0 0x121a98b4 0xffff3c2287ac 0xffff3c1660fc
/home/admin/oceanbase/log/observer.log.20250624194453892:[2025-06-24 19:36:48.768173] INFO [SQL.CG] allocate_mapped_memory (ob_jit_allocator.cpp:142) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=67] aarch64 memory allocated not safe, try again(addr=0xfff6ffff0000, start=281440616841215, page_size=65536, num_pages=1)
/home/admin/oceanbase/log/observer.log.20250624194453892:[2025-06-24 19:36:48.768214] INFO [SQL.CG] reserve (ob_jit_allocator.cpp:317) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=12] AARCH64: reserve ObJitMemoryGroup successed(header_=0xfff94f70e520, total_=65536, block_cnt_=1, *cur={addr:0xfff6ff380000, alloc_end:0xfff6ff380000, size:65536})
/home/admin/oceanbase/log/observer.log.20250624194453892:[2025-06-24 19:36:48.768470] INFO [PL] compile (ob_pl_compile.cpp:240) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=13] >>>>>>>>Final Compile Anonymous Block Time: (stmt_id=4646, compile_end - compile_start=119811)
/home/admin/oceanbase/log/observer.log.20250624194453892:[2025-06-24 19:36:48.768489] INFO [PL] generate_pl_function (ob_pl.cpp:2527) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=11] >>>>>>>>>>Compile Anonymous Time: (ret=0, params=[{obj:{"NULL":"NULL"}, accuracy:{length:-1, precision:-1, scale:-1}, flag:0, raw_text_pos:-1, raw_text_len:-1, param_meta:{type:"NULL", collation:"binary", coercibility:"IGNORABLE"}}, {obj:{"LONGTEXT":"outrow", collation:"utf8mb4_bin", coercibility:"IMPLICIT"}, accuracy:{length:-1, precision:-1, scale:-1}, flag:0, raw_text_pos:-1, raw_text_len:-1, param_meta:{type:"LONGTEXT", collation:"utf8mb4_bin", coercibility:"IMPLICIT"}}], anonymouse_sql=BEGIN ?:=DBMS_LOB.GETLENGTH( ?);END;, compile_end - compile_start=120104, routine={ns:4, ref_count:1, tenant_schema_version:1750762040046128, sys_schema_version:1744798710008576, object_id:149162, dependency_tables:[{table_id:310050, schema_version:1719370521927344, object_type:6, is_db_explicit:false, is_existed:true}], params_info:[{flag:{need_to_check_type:1, need_to_check_bool_value:0, expected_bool_value:0, is_pl_mock_default_param:0, need_to_check_extend_type:1, is_batch_parameter:0, ignore_scale_check:0}, scale:-1, type:"NULL", ext_real_type:"NULL", is_oracle_empty_string:false, col_type:"binary", pl_type:255, udt_id:18446744073709551615}, {flag:{need_to_check_type:1, need_to_check_bool_value:0, expected_bool_value:0, is_pl_mock_default_param:0, need_to_check_extend_type:1, is_batch_parameter:0, ignore_scale_check:0}, scale:-1, type:"LONGTEXT", ext_real_type:"NULL", is_oracle_empty_string:false, col_type:"utf8mb4_bin", pl_type:255, udt_id:18446744073709551615}], variables:[{type:0, type_from:0, not_null:false, pls_type:0, type_info:[], obj_type:{meta:{type:"NUMBER", collation:"binary", coercibility:"NUMERIC"}, accuracy:{length:32767, precision:-1, scale:-85}, charset:2, is_binary_collation:false, is_zero_fill:false}, {type:0, type_from:0, not_null:false, pls_type:0, type_info:[], obj_type:{meta:{type:"LONGTEXT", collation:"utf8mb4_bin", coercibility:"IMPLICIT"}, accuracy:{length:-1, precision:-1, scale:0}, charset:2, is_binary_collation:false, is_zero_fill:false}], default_idxs:[-1, -1], function_name:"", priv_user:""})
1.2.2 疑似llvm内存分配问题,通过关键字再次过滤日志
[admin@xobs127 ~]$ grep 'allocate_mapped_memory' observer.log.2025062*
observer.log.20250624183139724:[2025-06-24 18:26:18.799544] EDIAG [SQL.CG] allocate_mapped_memory (ob_jit_allocator.cpp:122) [14162][T1004_L0_G0][T1004][YB420A81037F-000634EDC72BA1DD-0-0] [lt=21][errcode=-4013] allocate jit memory failed(addr=0xfff6ffff0000, num_bytes=4949, page_size=65536, num_pages=1) BACKTRACE:0x11f13520 0xbb54190 0xba78a08 0xba78584 0xba784b4 0xba782e4 0x119dfa58 0x119df784 0x119e0c74 0x59b34e4 0x59c5494 0x59ae3f4 0x119de484 0x119df2d0 0x119dbcec 0x119dba3c 0x119bea48 0x8354100 0x83523fc 0x820e1c0 0x820b1c4 0xaf882e4 0xad582cc 0xba9d218 0xb8954f8 0xb8927c8 0xba467cc 0xb8471dc 0xb83c720 0xb8310cc 0xb81fa38 0x9017ac0 0x121a98b4 0xffff3c2287ac 0xffff3c1660fc
observer.log.20250624183139724:[2025-06-24 18:26:18.899669] INFO [SQL.CG] allocate_mapped_memory (ob_jit_allocator.cpp:142) [14162][T1004_L0_G0][T1004][YB420A81037F-000634EDC72BA1DD-0-0] [lt=62] aarch64 memory allocated not safe, try again(addr=0xfff6ffff0000, start=281440616841215, page_size=65536, num_pages=1)
observer.log.20250624183139724:[2025-06-24 18:26:20.795104] INFO [SQL.CG] allocate_mapped_memory (ob_jit_allocator.cpp:142) [14162][T1004_L0_G0][T1004][YB420A81037F-000634EDC72BA1E0-0-0] [lt=557] aarch64 memory allocated not safe, try again(addr=0xfff6ffff0000, start=281440616841215, page_size=65536, num_pages=1)
observer.log.20250624194453892:[2025-06-24 19:36:48.668040] EDIAG [SQL.CG] allocate_mapped_memory (ob_jit_allocator.cpp:122) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=18][errcode=-4013] allocate jit memory failed(addr=0xfff6ffff0000, num_bytes=1005, page_size=65536, num_pages=1) BACKTRACE:0x11f13520 0xbb54190 0xba78a08 0xba78584 0xba784b4 0xba782e4 0x119dfa58 0x119df784 0x119e0c74 0x59b34e4 0x59c5494 0x59ae3f4 0x119de484 0x119df2d0 0x119dbcec 0x119dba3c 0x119bea48 0x8354100 0x83523fc 0x820e1c0 0x820a7d4 0x8210b1c 0xaf880dc 0xad582cc 0xba9d218 0xb8954f8 0xb8927c8 0xba467cc 0x93577e0 0x9358548 0x935e6b8 0x9361cd4 0xb83566c 0xb81fa38 0x9017ac0 0x121a98b4 0xffff3c2287ac 0xffff3c1660fc
observer.log.20250624194453892:[2025-06-24 19:36:48.768173] INFO [SQL.CG] allocate_mapped_memory (ob_jit_allocator.cpp:142) [14164][T1004_L0_G0][T1004][YB420A81037F-000634EDC6CB992A-0-0] [lt=67] aarch64 memory allocated not safe, try again(addr=0xfff6ffff0000, start=281440616841215, page_size=65536, num_pages=1)
小结:
- JIT 内存分配失败
- 日志明确报错:allocate jit memory failed(错误码 -4013)
- 位置:ob_jit_allocator.cpp:122
- 详细参数:尝试分配 1005 bytes内存(num_bytes=1005),页大小 65536 bytes(page_size=65536),目标地址 0xfff6ffff0000
- 架构提示:aarch64 memory allocated not safe, try again(ARM64 平台地址不安全,需要重试)
首次分配失败 JIT 编译器尝试分配内存到地址 0xfff6ffff0000,但因 ARM64 平台地址对齐或权限问题失败。 触发错误日志:Unexpected internal error happen(错误码 -4388,内部错误码 -4013)。
2.初始 PL 函数获取失败 错误码 -5138,发生在从计划缓存获取 PL 函数时(get_pl_function from plan cache failed) SQL 内容:BEGIN ?:=DBMS_LOB.GETLENGTH( ?);END;
(调用 DBMS_LOB.GETLENGTH的匿名块) get pl function by sql failed, will ignore this error,日志明确提示“will ignore this error”,即系统会忽略该错误并继续尝试重新获取或直接编译,该错误是预期内的临时状态,无需关注。
- 最后两条INFO日志显示匿名块编译成功,耗时约120毫秒,返回码为0。表明尽管中间有错误,但整个执行过程正常结束,调用DBMS_LOB.GETLENGTH的功能应已生效。新地址分配成功:addr=0xfff6ff380000(日志:reserve ObJitMemoryGroup successed)。 最终编译完成:Final Compile Anonymous Block Time显示耗时约 119ms,返回成功(ret=0)。
根本原因:
ARM64 内存安全特性:
- W^X (Write XOR Execute) 策略是现代操作系统(包括 Linux on ARM64)强制执行的关键安全机制。
W^X 策略: 操作系统内核确保每个内存页的权限只能是以下之一:
- 可读 (R)
- 可读 + 可写 (RW)
- 可读 + 可执行 (RX)
- 可读 + 可写 + 可执行 (RWX) 通常被禁止。
- 无任何权限。
- 关键点:不允许同时存在写 (W) 和执行 (X) 权限。一个页面不能同时被标记为可写和可执行。
这意味着,在软件层面,你无法创建、也无法通过修改权限得到一个既可写又可执行的内存映射。
- ARM64 平台内存分配限制 OceanBase 的 JIT 编译器在 ARM64 架构下对内存地址有严格要求(需页面对齐且位于安全区域)。首次尝试的地址 0xfff6ffff0000不符合要求,导致分配失败。
1.3 GV$OB_MEMORY信息

从查询结果来看,租户内存分配并无异常。
1.4 查询官方文档
相关问题链接:
- https://www.oceanbase.com/knowledge-base/oceanbase-database-1000000002401683?back=kb
- https://www.oceanbase.com/knowledge-base/oceanbase-database-1000000002398331?back=kb
小结:
- PL执行很慢,卡在 allocate jit memory,分配地址跨越4G边界会重新分配,当每次分配地址一样,会分配失败。
- 问题出现版本为V4.2.1 GA。
- 如果出现该告警时,集群中observer节点无异常则可忽略。
- 结合相关文档来看,该问题应当在4.2.1bp8解决,当前集群版本为4.2.1bp10高于修复版本,但是现象一致,怀疑该问题有其他触发方式。
三、问题结论
通过证实,目前该问题解释如下:
- allocate jit memory问题已于V4.2.1bp8修复。
- 在V4.2.1bp8前出现,需要了解业务影响,需要升级至问题修复版本规避;在V4.2.1bp8后出现的告警均为误告警;误告警 于V4.2.5版本修复。
- 出现该告警时,如果业务无反馈或者没有其他关联性告警出现,可以忽略该告警,等待系统自动重试,使得告警恢复。
四、题外话
问题分析与结论受到实际场景、版本和个人经验的限制,分析过程可能有一定的不准确信息,欢迎大家积极讨论,共同提升!!!