【 使用环境 】测试环境
【 OB or 其他组件 】
【 使用版本 】V4.3.5(LTS)
【问题描述】集群频繁发生CRASH ERROR
CRASH ERROR!!! IP=55b03beaf3a0, RBP=14b037f45a70, sig=4, sig_code=2, sig_addr=0x55b03beaf3a0, RLIMIT_CORE=unlimited, timestamp=1741776621735314, tid=941952, tname=T1_L0_G28, trace_id=YB42C0A80007-00063022A141E51A-0-0, lbt=0x1f96b218 0x1f1b698d 0x14b0c5f6323f 0x8bb63a0 0x9be8a9c 0x9c0812c 0x9c08505 0x9be51fd 0x9a466c5 0xa5f92d9 0xa5fa810 0xa5f85ef 0x924c3cf 0x924cafc 0x9253edc 0x9253edc 0x924ed8d 0x92176d1 0x92177b1 0x9217ad3 0x9224c5c 0x9226bd3 0x9226edb 0x9237427 0x1ef0327a 0x1eee447d 0xebf34b2 0xebf0fc5 0xebe4b4b 0xec2577e 0xec5339b 0xec473c9 0xeaa14cd 0xea7362e 0x14a77faa 0x11bba464 0x7c4fe26 0x7923e9c 0x792151d 0x7c4d9c4 0x7c4cd09 0x7c482ee 0x7cf5030 0x7cf4929 0xf8dbc1a 0xf8f4a69 0x81cfe74 0x78b043b 0x789e118 0xfc77118, SQL_ID=E9E2014C8CE705871C555597A6A32456, SQL_STRING=CALL DBMS_STATS.ASYNC_GATHER_STATS_JOB_PROC(600000000);
你好,
1、集群是什么架构
2、登入系统管理员或者业务租户管理员账号,select * from gv$ob_sql_audit where sql_id=‘E9E2014C8CE705871C555597A6A32456’
看下是什么sql有关联
3、grep ‘YB42C0A80007-00063022A141E51A-0-0’ /home/admin/oceanbase/log/observer.log.20250313* -A 2 -C 2
看下详细日志
ASYNC_GATHER_STATS_JOB_PROC,只看到这个在收集,其他信息好像没有什么价值啊
日志信息整一波,dump可以整一个
从日志看,是手动收集统计信息了吗,DBMS_STATS.ASYNC_GATHER_STATS_JOB_PROC是异步收集统计信息发生崩溃,建议直接使用dbms_stats.gather_schema_stats收集。
另外,企业版本可以通过官网工单系统获取完整支持
1、查看进程信息 ps -ef | grep observer | grep -v grep
2、查看事那个用户启动的 切换用户 su - admin
3、ulimit -a
4、sudo sysctl -p | grep core
5、sudo sysctl -p | grep core_pattern
6、查看一下 observer的执行文件在哪个目录下 addr2line -pCfe ./bin/observer 0x1f96b218 0x1f1b698d 0x14b0c5f6323f 0x8bb63a0 0x9be8a9c 0x9c0812c 0x9c08505 0x9be51fd 0x9a466c5 0xa5f92d9 0xa5fa810 0xa5f85ef 0x924c3cf 0x924cafc 0x9253edc 0x9253edc 0x924ed8d 0x92176d1 0x92177b1 0x9217ad3 0x9224c5c 0x9226bd3 0x9226edb 0x9237427 0x1ef0327a 0x1eee447d 0xebf34b2 0xebf0fc5 0xebe4b4b 0xec2577e 0xec5339b 0xec473c9 0xeaa14cd 0xea7362e 0x14a77faa 0x11bba464 0x7c4fe26 0x7923e9c 0x792151d 0x7c4d9c4 0x7c4cd09 0x7c482ee 0x7cf5030 0x7cf4929 0xf8dbc1a 0xf8f4a69 0x81cfe74 0x78b043b 0x789e118 0xfc77118
7、把observer的日志发一下
8、查看一下系统的版本信息 cat /etc/issue
您好:
1、集群架构
2、登入系统管理员或者业务租户管理员账号,select * from gv$ob_sql_audit where sql_id=‘E9E2014C8CE705871C555597A6A32456’,重启过集群服务
3、grep ‘YB42C0A80007-00063022A141E51A-0-0’ /oceanbase/myob/oceanbase/log/observer.log.20250313090647462 -A 2 -C 2
[2025-03-12 18:50:21.656624] INFO [SQL.SESSION] init_system_variables (ob_basic_session_info.cpp:1155) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=4] load default system variable(delayed_insert_limit=“100”)
[2025-03-12 18:50:21.656629] INFO [SQL.SESSION] init_system_variables (ob_basic_session_info.cpp:1155) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=4] load default system variable(ndb_version="")
[2025-03-12 18:50:21.656634] INFO [SQL.SESSION] init_system_variables (ob_basic_session_info.cpp:1155) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=3] load default system variable(auto_generate_certs=“1”)
[2025-03-12 18:50:21.656639] INFO [SQL.SESSION] init_system_variables (ob_basic_session_info.cpp:1155) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=3] load default system variable(optimizer_cost_based_transformation=“1”)
[2025-03-12 18:50:21.656644] INFO [SQL.SESSION] init_system_variables (ob_basic_session_info.cpp:1155) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=3] load default system variable(range_index_dive_limit=“10”)
[2025-03-12 18:50:21.656649] INFO [SQL.SESSION] init_system_variables (ob_basic_session_info.cpp:1155) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=3] load default system variable(partition_index_dive_limit=“10”)
[2025-03-12 18:50:21.656653] INFO [SQL.SESSION] init_system_variables (ob_basic_session_info.cpp:1155) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=3] load default system variable(ob_table_access_policy=“2”)
[2025-03-12 18:50:21.656746] INFO [SQL.OPT] assign_with_only_readable_replica (ob_phy_table_location_info.cpp:90) [634642][T1_L0_G0][T1][YB42C0A80007-00063022A0C1F5F2-0-0] [lt=50] the replica location is invalid(bl_key={server:“192.xx.xx.09:2882”, tenant_id:1, ls_id:{id:1}}, replica_loc={server:“192.xx.xx.09:2882”, role:2, sql_port:2881, replica_type:0, property:{memstore_percent:100}, restore_status:{status:0}, proposal_id:0})
[2025-03-12 18:50:21.657402] INFO [COMMON] ObBaseResourcePool (ob_resource_pool.h:122) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=4] Construction ObResourcePool this=0x55b055728880 type=N9oceanbase8observer20ObInnerSQLConnectionE allocator=0x55b055728a80 free_list=0x55b055728900 ret=0 bt=0x810107f 0x104a3bc0 0x104a2edb 0x1028b48c 0x1028b312 0xf8dbbbc 0xf8f4a6a 0x81cfe75 0x78b043c 0x789e119 0xfc77119 0x1f95655e 0x14b0c5f58f2b 0x14b0c5e8e6bf
[2025-03-12 18:50:21.657439] INFO [COMMON] ObResourcePool (ob_resource_pool.h:316) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=21] Construction ObDefaultResourcePool this=0x55b055728880 type=N9oceanbase8observer20ObInnerSQLConnectionE bt=0x810107f 0x104a3547 0x104a30c3 0x1028b48c 0x1028b312 0xf8dbbbc 0xf8f4a6a 0x81cfe75 0x78b043c 0x789e119 0xfc77119 0x1f95655e 0x14b0c5f58f2b 0x14b0c5e8e6bf
[2025-03-12 18:50:21.657448] INFO [COMMON] get_resource_pool (ob_resource_pool.h:332) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=6] get_resource_pool ptr=0x55b055728880 name=N9oceanbase8observer20ObInnerSQLConnectionE label=RPInnerSqlConn
[2025-03-12 18:50:21.658112] WDIAG [SQL.PC] common_free (ob_lib_cache_object_manager.cpp:141) [634642][T1_L0_G0][T1][YB42C0A80007-00063022A0C1F5F2-0-0] [lt=32][errcode=0] set logical del time(cache_obj->get_logical_del_time()=4648564014, cache_obj->added_lc()=false, cache_obj->get_object_id()=3503, cache_obj->get_tenant_id()=1, lbt()=“0x810107f 0x10bac60e 0x78d12ff 0x78c2d42 0x78b88e4 0x78b1ae0 0x78aefed 0x789e119 0xfc77119 0x1f95655e 0x14b0c5f58f2b 0x14b0c5e8e6bf”)
[2025-03-12 18:50:21.662271] INFO [SQL.OPT] assign_with_only_readable_replica (ob_phy_table_location_info.cpp:90) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=30] the replica location is invalid(bl_key={server:“192.xx.xx.09:2882”, tenant_id:1, ls_id:{id:1}}, replica_loc={server:“192.xx.xx.09:2882”, role:2, sql_port:2881, replica_type:0, property:{memstore_percent_:100}, restore_status:{status:0}, proposal_id:0})
[2025-03-12 18:50:21.663260] INFO [SHARE.SCHEMA] retrieve_routine_schema (ob_schema_retrieve_utils.ipp:3392) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=39] retrieve routine schema succeed(routine_info={tenant_id:1, database_id:201001, package_id:310001, owner_id:200001, routine_id:310051, routine_name:“async_gather_stats_job_proc”, overload:0, subprogram_id:50, schema_version:1741773934868464, routine_type:3, flag:16, priv_user:"", comp_flag:0, exec_env:“4194304,45,45,45,”, routine_body:"", comment:"", route_sql:"", type_id:-1, routine_params:[]})
[2025-03-12 18:50:21.663299] INFO [SHARE.SCHEMA] retrieve_routine_schema (ob_schema_retrieve_utils.ipp:3402) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=33] retrieve routine infos succeed(tenant_id=1)
[2025-03-12 18:50:21.665436] INFO [SQL.OPT] assign_with_only_readable_replica (ob_phy_table_location_info.cpp:90) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=15] the replica location is invalid(bl_key={server:“192.xx.xx.09:2882”, tenant_id:1, ls_id:{id:1}}, replica_loc={server:“192.xx.xx.09:2882”, role:2, sql_port:2881, replica_type:0, property:{memstore_percent_:100}, restore_status:{status:0}, proposal_id:0})
[2025-03-12 18:50:21.666528] INFO [SHARE.SCHEMA] get_batch_routines (ob_schema_service_sql_impl.cpp:3883) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=34] get batch routine info finish(schema_version=1741773934868464, ret=0)
[2025-03-12 18:50:21.670609] INFO [SQL.OPT] assign_with_only_readable_replica (ob_phy_table_location_info.cpp:90) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=25] the replica location is invalid(bl_key={server:“192.xx.xx.09:2882”, tenant_id:1, ls_id:{id:1}}, replica_loc={server:“192.xx.xx.09:2882”, role:2, sql_port:2881, replica_type:0, property:{memstore_percent_:100}, restore_status:{status:0}, proposal_id:0})
[2025-03-12 18:50:21.671361] WDIAG [STORAGE] check_ls_offline (ob_tx_table_interface.cpp:181) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=35][errcode=-4006] tx table is nullptr(ret=-4006, discover_ls_offline=false)
[2025-03-12 18:50:21.671436] INFO [SHARE.SCHEMA] retrieve_package_schema (ob_schema_retrieve_utils.ipp:3360) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=29] retrieve package schema succeed(schema={tenant_id:1, database_id:201001, owner_id:200001, package_id:310001, package_name:“dbms_stats”, schema_version:1741773934138752, type:1, flag:4, comp_flag:1, exec_env:“4194304,45,45,45,”, source:"PACKAGE dbms_stats AUTHID CURRENT_USER
DECLARE DEFAULT_METHOD_OPT VARCHAR(1) DEFAULT ‘Z’;
DECLARE DEFAULT_GRANULARITY VARCHAR(1) DEFAULT ‘Z’;
PROCEDURE async_gather_stats_job_proc (duration BIGINT DEFAULT NULL);
END dbms_stats", comment:"", route_sql:""})
[2025-03-12 18:50:21.671484] INFO [SHARE.SCHEMA] retrieve_package_schema (ob_schema_retrieve_utils.ipp:3360) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=43] retrieve all package schemas succeed(schema_array=[{tenant_id:1, database_id:201001, owner_id:200001, package_id:310001, package_name:“dbms_stats”, schema_version:1741773934138752, type:1, flag:4, comp_flag:1, exec_env:“4194304,45,45,45,”, source:"PACKAGE dbms_stats AUTHID CURRENT_USER
DECLARE DEFAULT_METHOD_OPT VARCHAR(1) DEFAULT ‘Z’;
DECLARE DEFAULT_GRANULARITY VARCHAR(1) DEFAULT ‘Z’;
PROCEDURE async_gather_stats_job_proc (duration BIGINT DEFAULT NULL);
END dbms_stats", comment:"", route_sql:""}])
[2025-03-12 18:50:21.671552] INFO [SHARE.SCHEMA] get_batch_packages (ob_schema_service_sql_impl.cpp:4068) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=14] get batch package info finish(schema_version=1741773934138752, ret=0)
[2025-03-12 18:50:21.674309] WDIAG [STORAGE] check_ls_offline (ob_tx_table_interface.cpp:181) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=16][errcode=-4006] tx table is nullptr(ret=-4006, discover_ls_offline=false)
[2025-03-12 18:50:21.674367] INFO [SHARE.SCHEMA] retrieve_package_schema (ob_schema_retrieve_utils.ipp:3360) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=25] retrieve package schema succeed(schema={tenant_id:1, database_id:201001, owner_id:200001, package_id:310052, package_name:“dbms_stats”, schema_version:1741773934829512, type:2, flag:4, comp_flag:1, exec_env:“4194304,45,45,45,”, source:"PACKAGE BODY dbms_stats
PROCEDURE gather_table_stats (
ownname VARCHAR(65535),
PRAGMA INTERFACE(C, ASYNC_GATHER_STATS_JOB_PROC);
END dbms_stats", comment:"", route_sql:""})
[2025-03-12 18:50:21.674397] INFO [SHARE.SCHEMA] retrieve_package_schema (ob_schema_retrieve_utils.ipp:3360) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=26] retrieve all package schemas succeed(schema_array=[{tenant_id:1, database_id:201001, owner_id:200001, package_id:310052, package_name:“dbms_stats”, schema_version:1741773934829512, type:2, flag:4, comp_flag:1, exec_env:“4194304,45,45,45,”, source:"PACKAGE BODY dbms_stats
PROCEDURE gather_table_stats (
ownname VARCHAR(65535),
PRAGMA INTERFACE(C, ASYNC_GATHER_STATS_JOB_PROC);
END dbms_stats", comment:"", route_sql:""}])
[2025-03-12 18:50:21.674447] INFO [SHARE.SCHEMA] get_batch_packages (ob_schema_service_sql_impl.cpp:4068) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=12] get batch package info finish(schema_version=1741773934829512, ret=0)
[2025-03-12 18:50:21.674510] INFO [PL] get_package_from_plan_cache (ob_pl_package_manager.cpp:1789) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=14] get pl package from plan cache failed(ret=-5138, package_id=310001)
[2025-03-12 18:50:21.678068] INFO [PL] compile_module (ob_llvm_helper.cpp:625) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=14] ================Optimized LLVM Module================
[2025-03-12 18:50:21.678405] INFO [PL] dump_module (ob_llvm_helper.cpp:644) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=23] Dump LLVM Compile Module!
(s.str().c_str()="; ModuleID = ‘PL/SQL’
source_filename = “PL/SQL”
declare i32 @unset_implicit_cursor_in_forall(%pl_exec_context*)
“)
[2025-03-12 18:50:21.680697] INFO [SQL.OPT] assign_with_only_readable_replica (ob_phy_table_location_info.cpp:90) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=19] the replica location is invalid(bl_key={server:“192.xx.xx.09:2882”, tenant_id:1, ls_id:{id:1}}, replica_loc={server:“192.xx.xx.09:2882”, role:2, sql_port:2881, replica_type:0, property:{memstore_percent_:100}, restore_status:{status:0}, proposal_id:0})
[2025-03-12 18:50:21.681311] INFO [PL.STORAGEROUTINE] read_dll_from_disk (ob_pl_persistent.cpp:601) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=36] succ decode dll from disk(ret=0, key_id_=310001, merge_version=1741774558824312)
[2025-03-12 18:50:21.682924] INFO [SQL.OPT] assign_with_only_readable_replica (ob_phy_table_location_info.cpp:90) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=24] the replica location is invalid(bl_key={server:“192.xx.xx.09:2882”, tenant_id:1, ls_id:{id:1}}, replica_loc={server:“192.xx.xx.09:2882”, role:2, sql_port:2881, replica_type:0, property:{memstore_percent_:100}, restore_status:{status:0}, proposal_id:0})
[2025-03-12 18:50:21.683497] INFO [STORAGE] check_read_snapshot_for_normal_or_split_dst (ob_tablet_create_delete_helper.cpp:344) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=35] tablet create transaction is committed, currently in finish transfer in transacation(ret=0, ls_id={id:1}, tablet_id={id:101063}, snapshot_version=9223372036854775807, trans_state=3, user_data={tablet_status:{val:1, str:“NORMAL”}, transfer_scn:{val:18446744073709551615, v:3}, transfer_ls_id:{id:-1}, data_type:1, create_commit_scn:{val:1741773912499448001, v:0}, create_commit_version:1741773912499448001, delete_commit_scn:{val:18446744073709551615, v:3}, delete_commit_version:-1, start_transfer_commit_version:-1, start_split_commit_version:-1})
[2025-03-12 18:50:21.683538] INFO [STORAGE] check_status_for_new_mds (ob_tablet_create_delete_helper.cpp:264) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=40] refresh tablet status cache(ret=0, ls_id={id:1}, tablet_id={id:101063}, tablet_status_cache={tablet_status:{val:1, str:“NORMAL”}, create_commit_version:1741773912499448001, delete_commit_version:-1}, snapshot_version=9223372036854775807)
[2025-03-12 18:50:21.683676] INFO [SQL] get_cache_obj (ob_sql_stat_record.cpp:621) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=11] sql stat record not found(ret=-5138, key={sql_id:“1D0BA376E273B9D622641124D8C59264”, plan_hash:13784039704594560790, source_addr:“0.0.0.0:0”})
[2025-03-12 18:50:21.685272] INFO [SQL.OPT] assign_with_only_readable_replica (ob_phy_table_location_info.cpp:90) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=24] the replica location is invalid(bl_key={server:“192.xx.xx.09:2882”, tenant_id:1, ls_id:{id:1}}, replica_loc={server:“192.xx.xx.09:2882”, role:2, sql_port:2881, replica_type:0, property:{memstore_percent_:100}, restore_status:{status:0}, proposal_id:0})
[2025-03-12 18:50:21.685810] INFO [SQL] get_cache_obj (ob_sql_stat_record.cpp:621) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=33] sql stat record not found(ret=-5138, key={sql_id:“1D0BA376E273B9D622641124D8C59264”, plan_hash:8885405656945226197, source_addr:“0.0.0.0:0”})
[2025-03-12 18:50:21.685894] INFO [PL] compile_package (ob_pl_compile.cpp:974) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=29] >>>>>>>>Final Compile Package Time: (package.get_id()=310001, package.get_name()=dbms_stats, compile_end - compile_start=10756)
[2025-03-12 18:50:21.685976] INFO [PL] add_package_to_plan_cache (ob_pl_package_manager.cpp:1736) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=19] add pl package to plan cache success(ret=0, package_id=310001, package->get_dependency_table()=[cnt:1, {table_id:310001, schema_version:1741773934138752, object_type:7, is_db_explicit:false, is_existed:true}], pc_ctx.key_={db_id:201001, key_id:310001, namespace:6, name:”"})
[2025-03-12 18:50:21.686062] INFO [PL] get_package_from_plan_cache (ob_pl_package_manager.cpp:1789) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=24] get pl package from plan cache failed(ret=-5138, package_id=310052)
[2025-03-12 18:50:21.716144] INFO [STORAGE] diagnose_for_suggestion (ob_compaction_suggestion.cpp:458) [634465][T1_DagScheduler][T1][Y0-0000000000000000-0-0] [lt=14] [COMPACTION DAG STATUS] (start_time=1741776611714518, end_time=1741776621716142, dag_status={})
[2025-03-12 18:50:21.716213] INFO [COMMON] dump_dag_status (ob_tenant_dag_scheduler.cpp:2861) [634465][T1_DagScheduler][T1][Y0-0000000000000000-0-0] [lt=67] dump_dag_status(priority=“PRIO_COMPACTION_HIGH”, limits=6, running_task=0, adaptive_task_limit=6, ready_dag_count=0, waiting_dag_count=0, rank_dag_count=0)
[2025-03-12 18:50:21.717180] INFO [COMMON] dump_dag_status (ob_tenant_dag_scheduler.cpp:3638) [634465][T1_DagScheduler][T1][Y0-0000000000000000-0-0] [lt=5] dump_dag_status[DAG_NET](type=“DAG_NET_TRANSFER_BACKFILL_TX”, dag_count=0)
[2025-03-12 18:50:21.717193] INFO [COMMON] dump_dag_status (ob_tenant_dag_scheduler.cpp:4342) [634465][T1_DagScheduler][T1][Y0-0000000000000000-0-0] [lt=12] dump_dag_status(dag_cnt=0, total_worker_cnt_=0, total_running_task_cnt=0, work_thread_num_=43, scheduled_task_cnt=0)
[2025-03-12 18:50:21.719216] INFO [PL] compile_module (ob_llvm_helper.cpp:625) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=15] ================Optimized LLVM Module================
[2025-03-12 18:50:21.719540] INFO [PL] dump_module (ob_llvm_helper.cpp:644) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=25] Dump LLVM Compile Module!
(s.str().c_str()="; ModuleID = ‘PL/SQL’
source_filename = “PL/SQL”
declare i32 @unset_implicit_cursor_in_forall(%pl_exec_context*)
“)
[2025-03-12 18:50:21.725391] INFO [PL] compile_module (ob_llvm_helper.cpp:625) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=23] ================Optimized LLVM Module================
[2025-03-12 18:50:21.726194] INFO [PL] dump_module (ob_llvm_helper.cpp:644) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=23] Dump LLVM Compile Module!
(s.str().c_str()=”; ModuleID = ‘PL/SQL’
source_filename = “PL/SQL”
[2025-03-12 18:50:21.735198] WDIAG sock_destroy (eloop.c:86) [634201][pnio1][T0][Y0-0000000000000000-0-0] [lt=5][errcode=0] PNIO close sock fd faild, s=0x14b06e4bd4e8, s->fd=150, errno=9
[2025-03-12 18:50:21.735207] INFO pkts_sk_delete (pkts_sk_factory.h:57) [634201][pnio1][T0][Y0-0000000000000000-0-0] [lt=8] PNIO sk_destroy: s=0x14b06e4bd4e8 io=0x14b09f004448
CRASH ERROR!!! IP=55b03beaf3a0, RBP=14b037f45a70, sig=4, sig_code=2, sig_addr=0x55b03beaf3a0, RLIMIT_CORE=unlimited, timestamp=1741776621735314, tid=941952, tname=T1_L0_G28, trace_id=YB42C0A80007-00063022A141E51A-0-0, lbt=0x1f96b218 0x1f1b698d 0x14b0c5f6323f 0x8bb63a0 0x9be8a9c 0x9c0812c 0x9c08505 0x9be51fd 0x9a466c5 0xa5f92d9 0xa5fa810 0xa5f85ef 0x924c3cf 0x924cafc 0x9253edc 0x9253edc 0x924ed8d 0x92176d1 0x92177b1 0x9217ad3 0x9224c5c 0x9226bd3 0x9226edb 0x9237427 0x1ef0327a 0x1eee447d 0xebf34b2 0xebf0fc5 0xebe4b4b 0xec2577e 0xec5339b 0xec473c9 0xeaa14cd 0xea7362e 0x14a77faa 0x11bba464 0x7c4fe26 0x7923e9c 0x792151d 0x7c4d9c4 0x7c4cd09 0x7c482ee 0x7cf5030 0x7cf4929 0xf8dbc1a 0xf8f4a69 0x81cfe74 0x78b043b 0x789e118 0xfc77118, SQL_ID=E9E2014C8CE705871C555597A6A32456, SQL_STRING=CALL DBMS_STATS.ASYNC_GATHER_STATS_JOB_PROC(600000000);
[2025-03-13 08:26:43.132947] INFO [SERVER] inner_main (main.cpp:568) [2174895][observer][T0][Y0-0000000000000001-0-0] [lt=0] succ to init logger(default file=“log/observer.log”, rs file=“log/rootservice.log”, election file=“log/election.log”, trace file=“log/trace.log”, audit_file=“audit/observer_2174893_20250313082643-948020608.aud”, alert file=“log/alert/alert.log”, max_log_file_size=268435456, enable_async_log=true)
[2025-03-13 08:26:43.133055] INFO [SERVER] inner_main (main.cpp:572) [2174895][observer][T0][Y0-0000000000000001-0-0] [lt=103] Virtual memory : 798,941,184 byte
没有手动收集过,系统自动采集调用的
好的,谢谢,我先测试下
1、(ob_phy_table_location_info.cpp:90) [634642][T1_L0_G0][T1][YB42C0A80007-00063022A0C1F5F2-0-0] [lt=50] the replica location is invalid(bl_key={server:“192.xx.xx.09:2882”, tenant_id:1, ls_id:{id:1}}, replica_loc={server:“192.xx.xx.09:2882”, role:2, sql_port:2881, replica_type:0, property:{memstore_percent*:100}, restore_status:{status:0}, proposal_id:0})
- 09副本的role是2说明是follow,memstore到了100%,说明还是内存撑爆,导致副本不可用,系统无法查询和写入。
2、 [2025-03-12 18:50:21.671361] WDIAG [STORAGE] check_ls_offline (ob_tx_table_interface.cpp:181) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=35][errcode=-4006] tx table is nullptr(ret=-4006, discover_ls_offline=false)
- 4006事务未初始化,日志流有问题,影响存储层的操作
结合巡检来看,内存不足的原因概率更大点?这样理解是否正确
好的,谢谢!
尽量按照我上面的发的步骤 提供一下信息 有助于排查问题 只是怀疑是内存不足
需要你在收集一下 堆栈信息 在生成core文件的那台机器上 收集一下
下面步骤是安装和收集:
安装步骤:
- 如果有gdb,并且版本比较低,需要先卸载
- 安装依赖
yum install gcc-c++ -y
yum install texinfo -y - 下载 gdb
wget https://ftp.gnu.org/gnu/gdb/gdb-9.1.tar.gz - 解压安装
tar -zxvf gdb-9.1.tar.gz
cd gdb-9.1.tar.gz
mkdir build
cd build
…/configure
make
make install
cp gdb/gdb /usr/bin - 确认安装完成
gdb --version
收集堆栈
gdb $observer $core_file
举例:gdb bin/observer core.28031
–在进入交互界面之后
set logging file /home/admin/oceanbase/gdb-output
set logging on
set pagination off
set print pretty on
set print elements 0
bt
bt full
info r
info thread
set logging off
quit
gdb 调试文件会被记录到 /home/admin/oceanbase/gdb-output中
1、memstore_percent`,表示该副本memstore占用内存上限的比值么?不是,这个100是个属性,表示是个正常副本
2、[2025-03-12 18:50:21.671361] WDIAG [STORAGE] check_ls_offline (ob_tx_table_interface.cpp:181) [941952][T1_L0_G28][T1][YB42C0A80007-00063022A141E51A-0-0] [lt=35][errcode=-4006] tx table is nullptr(ret=-4006, discover_ls_offline=false)
这个日志流副本应该是在offline状态,结构没有初始化
不能说明是内存不足导致的 还需要进一步分析 目前只是怀疑是内存不足导致的
ok,了解
通过 --from/–to 参数指定集群宕机问题所在的时间区间
obdiag gather scene run --scene=observer.cluster_down --from “2022-06-30 16:25:00” --to “2022-06-30 18:30:00”
https://www.oceanbase.com/docs/common-obdiag-cn-1000000002200538
指令集 查看一下 lscpu | grep avx