启动不久就CRASH,如何解决?

【 使用环境 】测试环境
【 OB or 其他组件 】
【 使用版本 】4.3.5
【问题描述】清晰明确描述问题
【复现路径】问题出现前后相关操作
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):

【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

【备注】基于 LLM 和开源文档 RAG 的论坛小助手已开放测试,在发帖时输入 [@论坛小助手] 即可召唤小助手,欢迎试用!

[2025-02-06 14:09:20.374481] INFO [RPC.OBRPC] on_disconnect (ob_rpc_net_handler.cpp:338) [1706][RpcIO][T0][Y0-0000000000000000-0-0] [lt=20] connection disconnect(easy_connection_str(c)=0.0.0.0_127.0.0.1:34710_132_0x1523fc5ef1b0 tp=0 t=1738822026822281-1738822159961840 s=0 r=0 io=94630/71074 sq=70774)
CRASH ERROR!!! IP=564a56f0b3a0, RBP=1523a4f45a70, sig=4, sig_code=2, sig_addr=0x564a56f0b3a0, RLIMIT_CORE=unlimited, timestamp=1738822160373819, tid=2709, tname=T1_L0_G28, trace_id=YB427F000001-00062D731187EFF6-0-0, lbt=0x1f96b218 0x1f1b698d 0x1524196984bf 0x8bb63a0 0x9be8a9c 0x9c0812c 0x9c08505 0x9be51fd 0x9a466c5 0xa5f92d9 0xa5fa810 0xa5f85ef 0x924c3cf 0x924cafc 0x9253edc 0x9253edc 0x924ed8d 0x92176d1 0x92177b1 0x9217ad3 0x9224c5c 0x9226bd3 0x9226edb 0x9237427 0x1ef0327a 0x1eee447d 0xebf34b2 0xebf0fc5 0xebe4b4b 0xec2577e 0xec5339b 0xec473c9 0xeaa14cd 0xea7362e 0x14a77faa 0x11bba464 0x7c4fe26 0x7923e9c 0x792151d 0x7c4d9c4 0x7c4cd09 0x7c482ee 0x7cf5030 0x7cf4929 0xf8dbc1a 0xf8f4a69 0x81cfe74 0x78b043b 0x789e118 0xfc77118, SQL_ID=E9E2014C8CE705871C555597A6A32456, SQL_STRING=CALL DBMS_STATS.ASYNC_GATHER_STATS_JOB_PROC(600000000);

1 个赞

推荐使用 obdiag 工具进行日志采集,并且将采集到的日志提供给我们以便确认问题所在。

日志采集场景: observer无故重启
日志采集命令: obdiag gather scene run --scene=observer.restart

为了更好地帮助您解决问题,在此需要您提供一些额外信息:

  1. 您能否提供更详细的错误日志内容,特别是 CRASH ERROR 前后的日志?
  2. 您的 OceanBase 数据库版本是多少?
  3. 您的集群配置情况如何(例如:硬件配置、操作系统版本等)?
  4. 在崩溃发生前,是否执行了特定的操作或运行了特定的应用程序?如果有,请简要描述。
  5. 这种崩溃是首次出现还是经常发生?如果是经常发生,请说明发生的频率。

附上敏捷诊断工具 obdiag 使用帮助链接

2 个赞

给出的日志信息不全,只有INFO信息。。建议按照小助手的说的用obdiag先收集下日志的报错信息

1 个赞

Get local repositories and plugins ok
The tool oceanbase-diagnostic-tool is already installed the latest version 2.6.0
Open ssh connection ok
[ERROR] connect OB: 127.0.0.1:2881 with user root@sys failed, error:(2003, “Can’t connect to MySQL server on ‘127.0.0.1’ ([Errno 111] Connection refused)”)
[ERROR] connect OB: 127.0.0.1:2881 with user root@sys failed, error:(2003, “Can’t connect to MySQL server on ‘127.0.0.1’ ([Errno 111] Connection refused)”)
[ERROR] connect OB: 127.0.0.1:2881 with user root@sys failed, error:(2003, “Can’t connect to MySQL server on ‘127.0.0.1’ ([Errno 111] Connection refused)”)
[ERROR] connect OB: 127.0.0.1:2881 with user root@sys failed, error:(2003, “Can’t connect to MySQL server on ‘127.0.0.1’ ([Errno 111] Connection refused)”)
[ERROR] connect OB: 127.0.0.1:2881 with user root@sys failed, error:(2003, “Can’t connect to MySQL server on ‘127.0.0.1’ ([Errno 111] Connection refused)”)
[ERROR] connect OB: 127.0.0.1:2881 with user root@sys failed, error:(2003, “Can’t connect to MySQL server on ‘127.0.0.1’ ([Errno 111] Connection refused)”)
[ERROR] connect OB: 127.0.0.1:2881 with user root@sys failed, error:(2003, “Can’t connect to MySQL server on ‘127.0.0.1’ ([Errno 111] Connection refused)”)
[ERROR] connect OB: 127.0.0.1:2881 with user root@sys failed, error:(2003, “Can’t connect to MySQL server on ‘127.0.0.1’ ([Errno 111] Connection refused)”)
[ERROR] connect OB: 127.0.0.1:2881 with user root@sys failed, error:(2003, “Can’t connect to MySQL server on ‘127.0.0.1’ ([Errno 111] Connection refused)”)
start execute_421
[ERROR] connect OB: 127.0.0.1:2881 with user root@sys failed, error:(2003, “Can’t connect to MySQL server on ‘127.0.0.1’ ([Errno 111] Connection refused)”)
[ERROR] NoLeaderScene execute Exception:execute_421 execute error: ‘NoneType’ object has no attribute ‘cursor’
rca finished. For more details, the result on ‘/data/obd/log/rca/obdiag_log_error_20250206151249’
You can get the suggest by ‘cat /data/obd/log/rca/obdiag_log_error_20250206151249/record.table’
Trace ID: c5398af6-e459-11ef-9dc6-52549b9cd8f8
If you want to view detailed obdiag logs, please run: /root/oceanbase-diagnostic-tool/obdiag display-trace c5398af6-e459-11ef-9dc6-52549b9cd8f8

Trace ID: c32137be-e459-11ef-b9d1-52549b9cd8f8
If you want to view detailed obd logs, please run: obd display-trace c32137be-e459-11ef-b9d1-52549b9cd8f8

根据您提供的日志信息,obdiag 工具尝试连接到 OceanBase 数据库时遇到了连接被拒绝的问题。具体错误为:

[ERROR] connect OB: 127.0.0.1:2881 with user root@sys failed, error:(2003, "Can't connect to MySQL server on '127.0.0.1' ([Errno 111] Connection refused)")

这表明 obdiag 无法通过 127.0.0.1:2881 端口与 OceanBase 数据库建立连接。以下是可能的原因及解决方法:

可能原因及解决方案

  1. OceanBase 数据库未启动

    • 检查:确保 OceanBase 数据库服务正在运行。可以使用以下命令检查:
      ps aux | grep observer
      
    • 解决:如果数据库未启动,请启动它。如果您使用的是 OBD(OceanBase Deployer),可以通过以下命令启动:
      obd cluster start <cluster_name>
      
  2. 端口号配置错误

    • 检查:确认 OceanBase 数据库监听的端口号是否为 2881。可以通过查看配置文件或使用以下 SQL 查询来确认:
      SELECT * FROM GV$OB_SERVERS;
      
    • 解决:如果端口号不匹配,请使用正确的端口号进行连接。
  3. 防火墙阻止连接

    • 检查:确认防火墙设置没有阻止对 2881 端口的访问。
    • 解决:如果是防火墙问题,请添加规则允许对 2881 端口的访问。例如,在 Linux 上可以使用以下命令:
      sudo firewall-cmd --zone=public --add-port=2881/tcp --permanent
      sudo firewall-cmd --reload
      
  4. 网络配置问题

    • 检查:确认 127.0.0.1 是否正确指向本地主机,并且没有其他网络配置问题。
    • 解决:如果存在网络配置问题,请修复相关配置。
  5. 权限问题

    • 检查:确认用户 root@sys 具有正确的权限并能够从 127.0.0.1 进行连接。
    • 解决:如果权限不足,请授予必要的权限。可以通过以下 SQL 命令授予权限:
      GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' IDENTIFIED BY 'your_password';
      FLUSH PRIVILEGES;
      
  6. 日志分析

    • 检查:查看 /data/obd/log/rca/obdiag_log_error_20250206151249/record.table 文件中的详细日志信息,以获取更多线索。
    • 解决:根据日志中的提示进一步排查问题。

综合建议

请按照上述步骤逐一排查,并提供更多的日志信息以便更准确地定位问题。如果问题仍然无法解决,建议联系 OceanBase 的技术支持团队以获取进一步的帮助。同时,您可以继续使用 obdiag 工具收集更多诊断信息,以便更好地协助解决问题。

具体信息可参考以下文档:

  1. 快速参数化
  2. 租户请求队列积压
  3. obcdc 支持增量旁路导入
  4. RESIGNAL

(小助手的答复已结束,如未能解决您的问题,请继续提问并等待其他同学的回复,谢谢!)

1 个赞

问题原因是oceanbase 频繁崩溃,运行环境是麒麟V10SP3,OCEANBASE版本是4.3.5
异常日志前后也贴了,这个用obdiag没啥用,你们得看看CRASH这个问题。没有跑特定应用程序。

%extract_arg86 = getelementptr inbounds %pl_condition_value, %pl_condition_value* %handler_condition, i64 0, i32 1
store i64 %load_error_code, i64* %extract_arg86, align 8
%extract_arg87 = getelementptr inbounds %pl_condition_value, %pl_condition_value* %handler_condition, i64 0, i32 2
store i8* %load_sql_state, i8** %extract_arg87, align 8
%extract_arg88 = getelementptr inbounds %pl_condition_value, %pl_condition_value* %handler_condition, i64 0, i32 3
store i64 %load_str_len, i64* %extract_arg88, align 8
%extract_arg89 = getelementptr inbounds %pl_condition_value, %pl_condition_value* %handler_condition, i64 0, i32 4
store i64 %line_number, i64* %extract_arg89, align 8
%extract_arg90 = getelementptr inbounds %pl_condition_value, %pl_condition_value* %handler_condition, i64 0, i32 5
store i8 0, i8* %extract_arg90, align 8
%create_exception = call %unwind_exception* @eh_create_exception(i64 %4, i64 %6, i64 %line_number, i64 %8, %pl_condition_value* nonnull %handler_condition)
switch i64 %load_error_code, label %normal_raise_block [
i64 1262, label %raise_exception
i64 1265, label %raise_exception
i64 1642, label %raise_exception
]

raise_exception: ; preds = %normal_raise_block, %ob_fail, %ob_fail, %ob_fail
%raise_exception91 = call i32 @_Unwind_RaiseException(%unwind_exception* %create_exception)
unreachable

normal_raise_block: ; preds = %ob_fail
%get_exception_class = call i64 @eh_classify_exception(i8* %load_sql_state)
%get_exception_class.off = add i64 %get_exception_class, -3
%switch = icmp ult i64 %get_exception_class.off, 2
br i1 %switch, label %reset_ret_block, label %raise_exception

reset_ret_block: ; preds = %normal_raise_block
store i32 0, i32* %int_alloca, align 4
br label %ob_success
}
")
[2025-02-06 15:24:20.375531] INFO [COMMON] compute_tenant_wash_size (ob_kvcache_store.cpp:1136) [12392][TimerWK3][T0][Y0-0000000000000000-0-0] [lt=36] Wash compute wash size(is_wash_valid=false, sys_total_wash_size=-70127616, global_cache_size=145653760, tenant_max_wash_size=0, tenant_min_wash_size=0, tenant_ids_=[cnt:4, 500, 508, 509, 1])
[2025-02-06 15:24:20.381440] INFO [RPC.OBRPC] do_server_loop (ob_net_keepalive.cpp:498) [12513][KeepAliveServer][T0][Y0-0000000000000000-0-0] [lt=23] socket need_disconn(n=0, errno=11)
[2025-02-06 15:24:20.381477] INFO [RPC.OBRPC] do_server_loop (ob_net_keepalive.cpp:528) [12513][KeepAliveServer][T0][Y0-0000000000000000-0-0] [lt=33] server connection closed, fd: 128, addr: “127.0.0.1:45224”
[2025-02-06 15:24:20.381509] INFO [RPC.OBRPC] on_disconnect (ob_rpc_net_handler.cpp:338) [12498][RpcIO][T0][Y0-0000000000000000-0-0] [lt=14] connection disconnect(easy_connection_str(c)=0.0.0.0_127.0.0.1:45222_127_0x14f431fedb40 tp=0 t=1738826423975496-1738826660242825 s=0 r=1 io=155129/107735 sq=107735)
[2025-02-06 15:24:20.381582] INFO [RPC.OBRPC] on_disconnect (ob_rpc_net_handler.cpp:338) [12499][RpcIO][T0][Y0-0000000000000000-0-0] [lt=42] connection disconnect(easy_connection_str(c)=0.0.0.0_127.0.0.1:45238_132_0x14f412c04820 tp=0 t=1738826425308186-1738826659530592 s=0 r=0 io=150676/108284 sq=108284)
[2025-02-06 15:24:20.381642] INFO [RPC.OBRPC] on_disconnect (ob_rpc_net_handler.cpp:338) [12501][RpcIO][T0][Y0-0000000000000000-0-0] [lt=19] connection disconnect(easy_connection_str(c)=127.0.0.1:2882_127.0.0.1:45254_136_0x14f412c04f10 tp=0 t=1738826425331776-1738826659974775 s=4 r=0 io=151248/108427 sq=108427)
[2025-02-06 15:24:20.381672] INFO [RPC.OBRPC] on_disconnect (ob_rpc_net_handler.cpp:338) [12500][RpcIO][T0][Y0-0000000000000000-0-0] [lt=30] connection disconnect(easy_connection_str(c)=0.0.0.0_127.0.0.1:45250_135_0x14f431ffb790 tp=0 t=1738826425312023-1738826659974659 s=0 r=0 io=153797/107392 sq=107392)
CRASH ERROR!!! IP=55d9e009f3a0, RBP=14f3da1c3a70, sig=4, sig_code=2, sig_addr=0x55d9e009f3a0, RLIMIT_CORE=unlimited, timestamp=1738826660381807, tid=13745, tname=T1_L0_G28, trace_id=YB427F000001-00062D7417B3E05A-0-0, lbt=0x1f96b218 0x1f1b698d 0x14f44e12e4bf 0x8bb63a0 0x9be8a9c 0x9c0812c 0x9c08505 0x9be51fd 0x9a466c5 0xa5f92d9 0xa5fa810 0xa5f85ef 0x924c3cf 0x924cafc 0x9253edc 0x9253edc 0x924ed8d 0x92176d1 0x92177b1 0x9217ad3 0x9224c5c 0x9226bd3 0x9226edb 0x9237427 0x1ef0327a 0x1eee447d 0xebf34b2 0xebf0fc5 0xebe4b4b 0xec2577e 0xec5339b 0xec473c9 0xeaa14cd 0xea7362e 0x14a77faa 0x11bba464 0x7c4fe26 0x7923e9c 0x792151d 0x7c4d9c4 0x7c4cd09 0x7c482ee 0x7cf5030 0x7cf4929 0xf8dbc1a 0xf8f4a69 0x81cfe74 0x78b043b 0x789e118 0xfc77118, SQL_ID=E9E2014C8CE705871C555597A6A32456, SQL_STRING=CALL DBMS_STATS.ASYNC_GATHER_STATS_JOB_PROC(600000000);
[2025-02-06 15:25:59.126215] INFO [SERVER] inner_main (main.cpp:568) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=0] succ to init logger(default file=“log/observer.log”, rs file=“log/rootservice.log”, election file=“log/election.log”, trace file=“log/trace.log”, audit_file=“audit/observer_14123_202502061525591941994112.aud”, alert file=“log/alert/alert.log”, max_log_file_size=268435456, enable_async_log=true)
[2025-02-06 15:25:59.126379] INFO [SERVER] inner_main (main.cpp:572) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=157] Virtual memory : 798,949,376 byte
[2025-02-06 15:25:59.126402] INFO [SERVER] inner_main (main.cpp:575) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=22] Build basic information for each syslog file(info=“address: , observer version: OceanBase_CE 4.3.5.0, revision: 100000202024123117-5d6cb5cbc3f7c1ab6eb22e40abec8e160a8764d5, sysname: Linux, os release: 4.19.90-52.15.v2207.ky10.x86_64, machine: x86_64, tz GMT offset: 08:00”)
/data/obd/bin/observer -p 2881 -P 2882 -z zone1 -n water -c 1 -d /data/obd/store -I 127.0.0.1 -o __min_full_resource_pool_memory=2147483648,memory_limit=6G,system_memory=1G,datafile_size=2G,datafile_next=2G,datafile_maxsize=20G,log_disk_size=14G,cpu_count=16,enable_syslog_wf=False,max_syslog_file_count=4,large_query_threshold=600s,enable_record_trace_log=False,enable_syslog_recycle=1
observer (OceanBase_CE 4.3.5.0)

REVISION: 100000202024123117-5d6cb5cbc3f7c1ab6eb22e40abec8e160a8764d5
BUILD_BRANCH: HEAD
BUILD_TIME: Dec 31 2024 17:35:01
BUILD_FLAGS: RelWithDebInfo
BUILD_INFO:

Copyright (c) 2011-present OceanBase Inc.

[2025-02-06 15:25:59.126534] INFO print_all_limits (main.cpp:367) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=12] ============= *begin server limit report * =============
[2025-02-06 15:25:59.126547] INFO print_limit (main.cpp:355) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=11] [operator()] RLIMIT_CORE = unlimited
[2025-02-06 15:25:59.126560] INFO print_limit (main.cpp:355) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=11] [operator()] RLIMIT_CPU = unlimited
[2025-02-06 15:25:59.126571] INFO print_limit (main.cpp:355) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=9] [operator()] RLIMIT_DATA = unlimited
[2025-02-06 15:25:59.126582] INFO print_limit (main.cpp:355) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=10] [operator()] RLIMIT_FSIZE = unlimited
[2025-02-06 15:25:59.126593] INFO print_limit (main.cpp:355) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=9] [operator()] RLIMIT_LOCKS = unlimited
[2025-02-06 15:25:59.126604] INFO print_limit (main.cpp:357) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=9] [operator()] RLIMIT_MEMLOCK = 65536
[2025-02-06 15:25:59.126615] INFO print_limit (main.cpp:357) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=9] [operator()] RLIMIT_NOFILE = 655350
[2025-02-06 15:25:59.126626] INFO print_limit (main.cpp:357) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=9] [operator()] RLIMIT_NPROC = 655350
[2025-02-06 15:25:59.126636] INFO print_limit (main.cpp:355) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=9] [operator()] RLIMIT_STACK = unlimited
[2025-02-06 15:25:59.126646] INFO print_all_limits (main.cpp:377) [14125][observer][T0][Y0-0000000000000001-0-0] [lt=10] ============= stop server limit report ===============
name= (11 segments)
header 0: address=0x55e562cd1040
header 1: address=0x55e562cd12a8
header 2: address=0x55e562cd1000
header 3: address=0x55e5836fa540
header 4: address=0x55e583d31700
header 5: address=0x55e5836fa540
header 6: address=0x55e583d1cf38
header 7: address=0x55e5836fa540
header 8: address=0x55e566a7b474
header 9: address=0x55e562cd1000
header 10: address=0x55e562cd12c4

是obd部署的集群么?obd.log日志有么?有core文件么?
https://www.oceanbase.com/knowledge-base/oceanbase-database-1000000000761374?back=kb

1 个赞

[root@i-lcygh0jd log]# ls -l
总用量 1420040
drwxr-xr-x 2 root root 4096 2月 5 14:23 alert
-rw-r–r-- 1 root root 8204596 2月 6 15:29 election.log
-rw-r–r-- 1 root root 0 2月 5 14:23 election.log.wf
-rw-r–r-- 1 root root 121244023 2月 6 15:29 observer.log
-rw-r–r-- 1 root root 268439994 2月 6 13:27 observer.log.20250206132741278
-rw-r–r-- 1 root root 268440844 2月 6 13:43 observer.log.20250206134300068
-rw-r–r-- 1 root root 268440441 2月 6 14:03 observer.log.20250206140308622
-rw-r–r-- 1 root root 268440437 2月 6 15:20 observer.log.20250206152024646
-rw-r–r-- 1 root root 2351 2月 6 15:20 observer.log.wf
drwxr-xr-x 3 root root 4096 2月 6 15:12 rca
-rw-r–r-- 1 root root 126604033 2月 6 15:29 rootservice.log
-rw-r–r-- 1 root root 0 2月 5 14:23 rootservice.log.wf
-rw-r–r-- 1 root root 124254048 2月 6 15:27 trace.log
[root@i-lcygh0jd log]#

obd安装,在obd安装目录下未发现core文件。
observer.log.20250206152024646:CRASH ERROR!!! IP=555d35a653a0, RBP=148d26449a70, sig=4, sig_code=2, sig_addr=0x555d35a653a0, RLIMIT_CORE=unlimited, timestamp=1738823060383527, tid=3873, tname=T1_L0_G28, trace_id=YB427F000001-00062D7327BB2215-0-0, lbt=0x1f96b218 0x1f1b698d 0x148d9d13d4bf 0x8bb63a0 0x9be8a9c 0x9c0812c 0x9c08505 0x9be51fd 0x9a466c5 0xa5f92d9 0xa5fa810 0xa5f85ef 0x924c3cf 0x924cafc 0x9253edc 0x9253edc 0x924ed8d 0x92176d1 0x92177b1 0x9217ad3 0x9224c5c 0x9226bd3 0x9226edb 0x9237427 0x1ef0327a 0x1eee447d 0xebf34b2 0xebf0fc5 0xebe4b4b 0xec2577e 0xec5339b 0xec473c9 0xeaa14cd 0xea7362e 0x14a77faa 0x11bba464 0x7c4fe26 0x7923e9c 0x792151d 0x7c4d9c4 0x7c4cd09 0x7c482ee 0x7cf5030 0x7cf4929 0xf8dbc1a 0xf8f4a69 0x81cfe74 0x78b043b 0x789e118 0xfc77118, SQL_ID=2D33A95D66306851E3C623743C77BA01, SQL_STRING=CALL DBMS_STATS.PURGE_STATS(NULL);
observer.log.20250206152024646:[2025-02-06 15:20:21.897388] INFO print_limit (main.cpp:355) [12383][observer][T0][Y0-0000000000000001-0-0] [lt=11] [operator()] RLIMIT_CORE = unlimited
observer.log.20250206152024646:[2025-02-06 15:20:21.897399] INFO print_limit (main.cpp:355) [12383][observer][T0][Y0-0000000000000001-0-0] [lt=9] [operator()] RLIMIT_CPU = unlimited
observer.log.20250206152024646:[2025-02-06 15:20:21.897408] INFO print_limit (main.cpp:355) [12383][observer][T0][Y0-0000000000000001-0-0] [lt=8] [operator()] RLIMIT_DATA = unlimited
observer.log.20250206152024646:[2025-02-06 15:20:21.897418] INFO print_limit (main.cpp:355) [12383][observer][T0][Y0-0000000000000001-0-0] [lt=8] [operator()] RLIMIT_FSIZE = unlimited
observer.log.20250206152024646:[2025-02-06 15:20:21.897427] INFO print_limit (main.cpp:355) [12383][observer][T0][Y0-0000000000000001-0-0] [lt=8] [operator()] RLIMIT_LOCKS = unlimited
observer.log.20250206152024646:[2025-02-06 15:20:21.897436] INFO print_limit (main.cpp:357) [12383][observer][T0][Y0-0000000000000001-0-0] [lt=8] [operator()] RLIMIT_MEMLOCK = 65536
observer.log.20250206152024646:[2025-02-06 15:20:21.897446] INFO print_limit (main.cpp:357) [12383][observer][T0][Y0-0000000000000001-0-0] [lt=8] [operator()] RLIMIT_NOFILE = 655350
observer.log.20250206152024646:[2025-02-06 15:20:21.897465] INFO print_limit (main.cpp:357) [12383][observer][T0][Y0-0000000000000001-0-0] [lt=18] [operator()] RLIMIT_NPROC = 655350
observer.log.20250206152024646:[2025-02-06 15:20:21.897475] INFO print_limit (main.cpp:355) [12383][observer][T0][Y0-0000000000000001-0-0] [lt=8] [operator()] RLIMIT_STACK = unlimited

obd.log位置在哪里,find没有发现

PROCEDURE async_gather_stats_job_proc (duration BIGINT DEFAULT NULL);
PRAGMA INTERFACE(C, ASYNC_GATHER_STATS_JOB_PROC);
END dbms_stats", comment:"", route_sql:""}])
[2025-02-06 15:41:08.864583] INFO [SHARE.SCHEMA] get_batch_packages (ob_schema_service_sql_impl.cpp:4068) [4096][T1_L0_G28][T1][YB427F000001-00062D745D081430-0-0] [lt=16] get batch package info finish(schema_version=1738736665582384, ret=0)
[2025-02-06 15:41:08.864667] INFO [PL] get_package_from_plan_cache (ob_pl_package_manager.cpp:1789) [4096][T1_L0_G28][T1][YB427F000001-00062D745D081430-0-0] [lt=17] get pl package from plan cache failed(ret=-5138, package_id=310001)
[2025-02-06 15:41:08.869277] INFO [PL] compile_module (ob_llvm_helper.cpp:625) [4096][T1_L0_G28][T1][YB427F000001-00062D745D081430-0-0] [lt=18] ================Optimized LLVM Module================
[2025-02-06 15:41:08.869692] INFO [PL] dump_module (ob_llvm_helper.cpp:644) [4096][T1_L0_G28][T1][YB427F000001-00062D745D081430-0-0] [lt=27] Dump LLVM Compile Module!
(s.str().c_str()="; ModuleID = ‘PL/SQL’
source_filename = “PL/SQL”
target datalayout = “e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128”

%pl_exec_context = type { i64, i64, i64, %seg_param_store*, %obj*, i32*, i64, i8, i64 }
%seg_param_store = type { i64, i32, %wrapper_allocator, %seg_pointer_array, i64, i64 }
%wrapper_allocator = type { i64, i64 }
%seg_pointer_array = type { i64, [1 x i64], i64, [1 x i64], i64, i64, i64, i32, %wrapper_allocator, i8, %memory_context }
%memory_context = type { i64, i64, i64 }
%obj = type { %obj_meta, i32, i64 }
%obj_meta = type { i8, i8, i8, i8 }
%objparam = type { %obj, i64, i32, %obj.0, i32, i32, %obj_meta }
%obj.0 = type { i64, i8 }
%data_type = type { %obj_meta, i64, i32, i8, i8 }
%unwind_exception = type { i64 }
%pl_condition_value = type { i64, i64, i8
, i64, i64, i8 }

declare i32 @spi_calc_expr_at_idx(%pl_exec_context*, i64, i64, %objparam*)

declare i32 @spi_calc_package_expr(%pl_exec_context*, i64, i64, %objparam*)

declare i32 @spi_convert_objparam(%pl_exec_context*, %objparam*, i64, %objparam*, i8)

declare i32 @spi_set_variable_to_expr(%pl_exec_context*, i64, %objparam*, i8, i8)

declare i32 @spi_query_into_expr_idx(%pl_exec_context*, i8*, i64, i64*, i64, %data_type*, i64, i8*, i64*, i8, i8, i8)

declare i32 @spi_end_trans(%pl_exec_context*, i8*, i8)

declare i32 @spi_update_location(%pl_exec_context*, i64)

declare i32 @spi_execute_with_expr_idx(%pl_exec_context*, i8*, i64, i64*, i64, i64*, i64, %data_type*, i64, i8*, i64*, i8, i8, i8, i8)

declare i32 @spi_execute_immediate(%pl_exec_context*, i64, i64, i64*, i64, i64*, i64, %data_type*, i64, i8*, i64*, i8, i8, i8)

declare i32 @spi_alloc_complex_var(%pl_exec_context*, i8, i64, i64, i32, i64*, i64)

1、obd日志: 默认保存在安装obd的用户home路径: cd ~/.obd/log/
2、obd cluster list --查看集群名 obd cluster edit-config {集群名} --保存在文本里 提供一下

1 个赞

看一下cpu指令集: lscpu

1 个赞

/var/lib/systemd/coredump/下有core文件,你看和这个问题有关吗?需要上传哪个?

[root@i-lcygh0jd coredump]# ls -lrth
总用量 2.5G
-rw-r----- 1 root root 335K 2月 5 14:18 core.udevadm.0.ba5be4fd6ebc449c95e175cf5eb1c507.636.1738736306000000000000.lz4
-rw-r----- 1 root root 334K 2月 5 14:18 core.udevadm.0.ba5be4fd6ebc449c95e175cf5eb1c507.953.1738736307000000000000.lz4
-rw-r----- 1 root root 335K 2月 5 14:21 core.udevadm.0.d7061597b4d0470399ed1c5b7a0263e8.627.1738736513000000000000.lz4
-rw-r----- 1 root root 334K 2月 5 14:21 core.udevadm.0.d7061597b4d0470399ed1c5b7a0263e8.919.1738736514000000000000.lz4
-rw-r----- 1 root root 243M 2月 5 14:39 core.observer.0.d7061597b4d0470399ed1c5b7a0263e8.1916.1738737560000000000000.lz4
-rw-r----- 1 root root 334K 2月 5 14:55 core.udevadm.0.9ff8ae4bed784b97aabbc783488c8e65.660.1738738507000000000000.lz4
-rw-r----- 1 root root 334K 2月 5 14:55 core.udevadm.0.9ff8ae4bed784b97aabbc783488c8e65.1022.1738738509000000000000.lz4
-rw-r----- 1 root root 130M 2月 5 14:57 core.observer.0.9ff8ae4bed784b97aabbc783488c8e65.2871.1738738670000000000000.lz4
-rw-r----- 1 root root 131M 2月 5 15:09 core.observer.0.9ff8ae4bed784b97aabbc783488c8e65.4109.1738739360000000000000.lz4
-rw-r----- 1 root root 138M 2月 5 17:09 core.observer.0.9ff8ae4bed784b97aabbc783488c8e65.20095.1738746560000000000000.lz4
-rw-r----- 1 root root 140M 2月 5 17:24 core.observer.0.9ff8ae4bed784b97aabbc783488c8e65.22762.1738747460000000000000.lz4
-rw-r----- 1 root root 142M 2月 5 17:39 core.observer.0.9ff8ae4bed784b97aabbc783488c8e65.25083.1738748360000000000000.lz4
-rw-r----- 1 root root 146M 2月 5 19:09 core.observer.0.9ff8ae4bed784b97aabbc783488c8e65.35673.1738753760000000000000.lz4
-rw-r----- 1 root root 136M 2月 5 19:24 core.observer.0.9ff8ae4bed784b97aabbc783488c8e65.39745.1738754660000000000000.lz4
-rw-r----- 1 root root 142M 2月 5 19:39 core.observer.0.9ff8ae4bed784b97aabbc783488c8e65.41508.1738755560000000000000.lz4
-rw-r----- 1 root root 136M 2月 5 20:24 core.observer.0.9ff8ae4bed784b97aabbc783488c8e65.47445.1738758260000000000000.lz4
-rw-r----- 1 root root 140M 2月 6 11:39 core.observer.0.9ff8ae4bed784b97aabbc783488c8e65.153064.1738813160000000000000.lz4
-rw-r----- 1 root root 142M 2月 6 13:39 core.observer.0.9ff8ae4bed784b97aabbc783488c8e65.169268.1738820360000000000000.lz4
-rw-r----- 1 root root 143M 2月 6 13:54 core.observer.0.9ff8ae4bed784b97aabbc783488c8e65.175568.1738821266000000000000.lz4
-rw-r----- 1 root root 334K 2月 6 13:55 core.udevadm.0.2c1935f5f74c4aaa95e0f6dbb5c5be32.662.1738821302000000000000.lz4
-rw-r----- 1 root root 334K 2月 6 13:55 core.udevadm.0.2c1935f5f74c4aaa95e0f6dbb5c5be32.1134.1738821304000000000000.lz4
-rw-r----- 1 root root 334K 2月 6 14:02 core.udevadm.0.62c9d56b86c047b2be55251aa423b4b7.668.1738821744000000000000.lz4
-rw-r----- 1 root root 334K 2月 6 14:02 core.udevadm.0.62c9d56b86c047b2be55251aa423b4b7.1150.1738821746000000000000.lz4
-rw-r----- 1 root root 334K 2月 6 14:06 core.udevadm.0.1859b2017d6a4fd48c72e7d458839317.671.1738822013000000000000.lz4
-rw-r----- 1 root root 335K 2月 6 14:06 core.udevadm.0.1859b2017d6a4fd48c72e7d458839317.1076.1738822015000000000000.lz4
-rw-r----- 1 root root 138M 2月 6 14:09 core.observer.0.1859b2017d6a4fd48c72e7d458839317.1590.1738822160000000000000.lz4
-rw-r----- 1 root root 334K 2月 6 14:13 core.udevadm.0.574702895df74125a5c786cf0450d9a0.679.1738822385000000000000.lz4
-rw-r----- 1 root root 334K 2月 6 14:13 core.udevadm.0.574702895df74125a5c786cf0450d9a0.1062.1738822387000000000000.lz4
-rw-r----- 1 root root 148M 2月 6 14:24 core.observer.0.574702895df74125a5c786cf0450d9a0.1527.1738823060000000000000.lz4
-rw-r----- 1 root root 142M 2月 6 15:24 core.observer.0.574702895df74125a5c786cf0450d9a0.12383.1738826660000000000000.lz4
-rw-r----- 1 root root 334K 2月 6 15:31 core.udevadm.0.4e5f2e0faea44f4e8935fc8f4754c063.699.1738827097000000000000.lz4
-rw-r----- 1 root root 334K 2月 6 15:31 core.udevadm.0.4e5f2e0faea44f4e8935fc8f4754c063.1138.1738827098000000000000.lz4
-rw-r----- 1 root root 147M 2月 6 15:41 core.observer.0.4e5f2e0faea44f4e8935fc8f4754c063.3294.1738827668000000000000.lz4

1 个赞

sysctl -a | grep pattern
你通过这个命令 查一下 你配置的core的目录位置 如果没有配置 就是没有
lscpu 指令集 查看一下

2 个赞

[root@i-lcygh0jd coredump]# sysctl -a | grep pattern
kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

[root@i-lcygh0jd systemd]# lscpu
架构: x86_64
CPU 运行模式: 32-bit, 64-bit
字节序: Little Endian
Address sizes: 40 bits physical, 48 bits virtual
CPU: 4
在线 CPU 列表: 0-3
每个核的线程数: 1
每个座的核数: 4
座: 1
NUMA 节点: 1
厂商 ID: GenuineIntel
CPU 系列: 6
型号: 6
型号名称: QEMU Virtual CPU
步进: 3
CPU MHz: 2299.998
BogoMIPS: 4599.99
超管理器厂商: KVM
虚拟化类型: 完全
L1d 缓存: 128 KiB
L1i 缓存: 128 KiB
L2 缓存: 16 MiB
L3 缓存: 16 MiB
NUMA 节点0 CPU: 0-3
Vulnerability Itlb multihit: KVM: Vulnerable
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
标记: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_good nopl xtopology cpuid tsc_known_freq pni cx16 x2apic hypervisor lahf_lm cpuid_fault pti

1 个赞

说错了,你这是进程都挂了。。应该用离线分析命令分析指定的日志文件
obdiag analyze log --files observer.log.20230831142211247

不过你都找到core文件了,可以从这些core文件分析下出问题的源头SQL,顺便在问下你这个集群是部署成功过没有?

https://www.oceanbase.com/knowledge-base/oceanbase-database-1000000000340650?back=kb

2 个赞

systemd-coredump.zip (22.1 KB)

1 个赞

部署成功了,就是启动一会(10分钟左右)就挂掉。

1 个赞