死锁日志分析

【 使用环境 】生产环境
【 OB or 其他组件 】observer
【 使用版本 】4.2.1.8
【问题描述】observer中死锁检测日志,无法根据traceid 找到sql语句
【复现路径】问题出现前后相关操作
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):

【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

2 个赞

你这个ob的版本是企业版还是社区版

3 个赞

observer.log.20250506141059716:[2025-05-06 14:09:57.458338] INFO [DETECT] maintain_deadlock_info_when_end_stmt (ob_trans_deadlock_adapter.cpp:670) [161324][T1008_L0_G0][T1008][YB42AC132334-000625103078B3BF-0-0] [lt=0] maintain deadlock info(step=3, ret=0, ret=“OB_SUCCESS”, exec_ctx.get_errcode()=-6005, exec_ctx.get_errcode()=“OB_TRY_LOCK_ROW_CONFLICT”, session={this:0x7f407815ece8, id:3222011776, deser:false, tenant:“ZABBIX”, tenant_id:1008, effective_tenant:“ZABBIX”, effective_tenant_id:1008, database:“nd_zabbix”, user:“a_zabbix@%”, consistency_level:3, session_state:2, autocommit:true, tx:0x7f3db8f940c0}, desc=NULL, is_rollback=true, conflict_txs=[])
observer.log.20250506141059716-[2025-05-06 14:09:57.458447] INFO [STORAGE.TRANS] on_wlock_retry (ob_memtable_context.cpp:303) [200279][T1008_L0_G0][T1008][YB42AC132334-0006251038746C44-0-0] [lt=3] mvcc_write conflict(key={“BIGINT UNSIGNED”:35}, tx_id={txid:31338022418}, conflict_tx_id={txid:31338022360}, this={ObIMvccCtx={alloc_type=0 ctx_descriptor=0 min_table_version=1708482676537232 max_table_version=1708482676537232 trans_version={val:4611686018427387903, v:0} commit_version={val:0, v:0} lock_wait_start_ts=0 replay_compact_version={val:0, v:0}} end_code=0 tx_status=0 is_readonly=false ref=1 trans_id={txid:31338022418} ls_id=1001 callback_alloc_count=0 callback_free_count=0 checksum=0 tmp_checksum=0 checksum_scn={val:0, v:0} redo_filled_count=0 redo_sync_succ_count=0 redo_sync_fail_count=0 main_list_length=1 unsynced_cnt=0 unsubmitted_cnt_=0 cb_statistics:[main=1, slave=0, merge=0, tx_end=0, rollback_to=0, fast_commit=0, remove_memtable=0]})
observer.log.20250506141059716-[2025-05-06 14:09:57.458523] INFO [LIB] stat (utility.h:1167) [41787][T1086_IOWorker][T1086][Y0-0000000000000000-0-0] [lt=16] [PALF STAT WRITE LOG COST TIME](cur_stat_count=12, stat_interval=1000000, avg (count/cost)=160, this=0x7f489c1b9a30, extra_info="")
observer.log.20250506141059716-[2025-05-06 14:09:57.458678] INFO [LIB] stat (utility.h:1167) [41890][T1086_ApplySrv0][T1086][YB42AC132334-0006250DE51F7440-0-0] [lt=0] [APPLY STAT CB APPEND COST TIME](cur_stat_count=3, stat_interval=5000000, avg (count/cost)=76, this=0x7f489a1fe280, extra_info="")
observer.log.20250506141059716-[2025-05-06 14:09:57.458699] INFO [LIB] stat (utility.h:1167) [41890][T1086_ApplySrv0][T1086][YB42AC132334-0006250DE51F7440-0-0] [lt=21] REACH SYSLOG RATE LIMIT [bandwidth]

observer.log.20250506141059716:[2025-05-06 14:09:57.467833] INFO [DETECT] maintain_deadlock_info_when_end_stmt (ob_trans_deadlock_adapter.cpp:670) [124192][T1008_L0_G0][T1008][YB42AC132334-000625102986EA7A-0-0] [lt=0] maintain deadlock info(step=3, ret=0, ret=“OB_SUCCESS”, exec_ctx.get_errcode()=-6005, exec_ctx.get_errcode()=“OB_TRY_LOCK_ROW_CONFLICT”, session={this:0x7f2ef82841f8, id:3221821675, deser:false, tenant:“ZABBIX”, tenant_id:1008, effective_tenant:“ZABBIX”, effective_tenant_id:1008, database:“nd_zabbix”, user:“a_zabbix@%”, consistency_level:3, session_state:2, autocommit:true, tx:0x7f39ca9b7100}, desc=NULL, is_rollback=true, conflict_txs=[])
observer.log.20250506141059716-[2025-05-06 14:09:57.468016] INFO [STORAGE.TRANS] do_tx_gc_ (ob_tx_loop_worker.cpp:235) [56971][T1257_TxLoopWor][T1257][Y0-0000000000000000-0-0] [lt=18] [Tx Loop Worker] check tx scheduler success(MTL_ID()=1257, *ls_ptr={ls_meta:{tenant_id:1257, ls_id:{id:1}, ls_create_status:1, clog_checkpoint_scn:{val:1746487280354258364, v:0}, clog_base_lsn:{lsn:241510060032}, rebuild_seq:0, migration_status:0, gc_state_:1, offline_scn_:{val:18446744073709551615, v:3}, restore_status:{status:0}, replayable_point:{val:18446744073709551615, v:3}, tablet_change_checkpoint_scn:{val:1746496830743073497, v:0}, all_id_meta:{id_meta:[{limited_id:1746511811262659045, latest_log_ts:{val:1746511794604745176, v:0}}, {limited_id:506000001, latest_log_ts:{val:1746501514903553571, v:0}}, {limited_id:3110000001, latest_log_ts:{val:1746487280354258364, v:0}}]}, transfer_scn:{val:0, v:0}, rebuild_info:{status:{status:0}, type:{type:0}}}, switch_epoch:0, log_handler:{role:2, proposal_id:4, palf_env_:0x7f415ebfc030, is_in_stop_state_:false, is_inited_:true, id_:1}, restore_handler:{is_inited:true, is_in_stop_state:false, id:1, proposal_id:9223372036854775807, role:2, parent:null, context:{issue_task_num:0, issue_version:-1, last_fetch_ts:-1, max_submit_lsn:{lsn:18446744073709551615}, max_fetch_lsn:{lsn:18446744073709551615}, max_fetch_scn:{val:18446744073709551615, v:3}, error_context:{ret_code:0, trace_id:Y0-0000000000000000-0-0, error_type:0, err_lsn:{lsn:18446744073709551615}}, task_count:0}, restore_context:{seek_done:false, lsn:{lsn:18446744073709551615}}}, is_inited:true, tablet_gc_handler:{tablet_persist_trigger:0, is_inited:true}, startup_transfer_info:{ls_id:{id:0}, transfer_start_scn:{val:18446744073709551615, v:3}}})
observer.log.20250506141059716-[2025-05-06 14:09:57.468033] WDIAG [STORAGE.TRANS] mvcc_write (ob_mvcc_row.cpp:1005) [190955][T1008_L0_G0][T1008][YB42AC132334-000625103584C601-0-0] [lt=0][errcode=-6005] REACH SYSLOG RATE LIMIT [bandwidth]
observer.log.20250506141059716-[2025-05-06 14:09:57.468090] INFO [STORAGE.TRANS] on_wlock_retry (ob_memtable_context.cpp:303) [190955][T1008_L0_G0][T1008][YB42AC132334-000625103584C601-0-0] [lt=55] mvcc_write conflict(key={“BIGINT UNSIGNED”:35}, tx_id={txid:31338022427}, conflict_tx_id={txid:31338022360}, this={ObIMvccCtx={alloc_type=0 ctx_descriptor=0 min_table_version=1708482676537232 max_table_version=1708482676537232 trans_version={val:4611686018427387903, v:0} commit_version={val:0, v:0} lock_wait_start_ts=0 replay_compact_version={val:0, v:0}} end_code=0 tx_status=0 is_readonly=false ref=1 trans_id={txid:31338022427} ls_id=1001 callback_alloc_count=0 callback_free_count=0 checksum=0 tmp_checksum=0 checksum_scn={val:0, v:0} redo_filled_count=0 redo_sync_succ_count=0 redo_sync_fail_count=0 main_list_length=1 unsynced_cnt=0 unsubmitted_cnt_=0 cb_statistics:[main=1, slave=0, merge=0, tx_end=0, rollback_to=0, fast_commit=0, remove_memtable=0]})
observer.log.20250506141059716-[2025-05-06 14:09:57.468165] INFO [SERVER] execute_write_inner (ob_inner_sql_connection.cpp:1541) [57550][SvrMetaCh][T0][Y0-0000000000000000-0-0] [lt=27] execute write sql(ret=0, tenant_id=1, affected_rows=0, sql=DELETE FROM __all_ls_meta_table WHERE tenant_id = 1125 AND svr_ip = ‘172.19.35.52’ AND svr_port = 2882)

observer.log.20250506141059716:[2025-05-06 14:09:57.528303] INFO [SERVER] process (obmp_connect.cpp:480) [33798][T1008_MysqlQueu][T1008][Y0-000624E3575F1023-0-0] [lt=32] MySQL LOGIN(direct_client_ip=“172.19.35.51”, client_ip=172.26.20.130, tenant_name=ZABBIX, tenant_id=1008, user_name=f_zabbix, host_name=%, sessid=3221844706, proxy_sessid=12399292901220373402, sess_create_time=0, from_proxy=true, from_java_client=false, from_oci_client=false, from_jdbc_client=false, capability=10199693, proxy_capability=526919, use_ssl=false, c/s protocol=“OB_MYSQL_CS_TYPE”, autocommit=true, proc_ret=0, ret=0, conn->client_type_=3, conn->client_version_=0)
observer.log.20250506141059716:[2025-05-06 14:09:57.528383] INFO [SERVER] process (obmp_connect.cpp:480) [33790][T1008_MysqlQueu][T1008][Y0-000624E318DF3F08-0-0] [lt=2] MySQL LOGIN(direct_client_ip=“172.19.35.51”, client_ip=172.26.20.130, tenant_name=ZABBIX, tenant_id=1008, user_name=f_zabbix, host_name=%, sessid=3221906431, proxy_sessid=12399292901220373401, sess_create_time=0, from_proxy=true, from_java_client=false, from_oci_client=false, from_jdbc_client=false, capability=10199693, proxy_capability=526919, use_ssl=false, c/s protocol=“OB_MYSQL_CS_TYPE”, autocommit=true, proc_ret=0, ret=0, conn->client_type_=3, conn->client_version_=0)
observer.log.20250506141059716-[2025-05-06 14:09:57.528582] INFO [STORAGE] operator() (ob_tenant_freezer.cpp:131) [48790][T1160_Occam][T1160][Y0-0000000000000000-0-0] [lt=23] ====== tenant freeze timer task ======
observer.log.20250506141059716-[2025-05-06 14:09:57.528739] INFO [RPC.OBMYSQL] sm_conn_build_handshake (obsm_conn_callback.cpp:121) [32918][sql_nio16][T0][Y0-0000000000000000-0-0] [lt=22] new mysql sessid created(conn.sessid_=3221942824, support_ssl=false)
observer.log.20250506141059716-[2025-05-06 14:09:57.528781] INFO [RPC.OBMYSQL] init (obsm_conn_callback.cpp:141) [32918][sql_nio16][T0][Y0-0000000000000000-0-0] [lt=20] sm conn init succ(conn.sessid_=3221942824, sess.client_addr_=“172.19.35.53:12582”)
observer.log.20250506141059716-[2025-05-06 14:09:57.528915] INFO [SQL.ENG] broadcast_dfo (ob_px_interruption.cpp:76) [142773][T1008_L0_G0][T1008][YB42AC132334-000625102D064A97-0-0] [lt=2] success to send interrupt message(int_code={code:-4662, info:{msg:“tid:142773,from:“172.19.35.52:2882”,PX ABORT DFO”}}, addr=“172.19.35.52:2882”, interrupt_id=task(1727747509470953,8594130434))

observer.log.20250506141059716:[2025-05-06 14:09:58.034609] INFO [DETECT] maintain_deadlock_info_when_end_stmt (ob_trans_deadlock_adapter.cpp:670) [193789][T1008_L0_G0][T1008][YB42AC132334-00062510364525D4-0-0] [lt=0] maintain deadlock info(step=3, ret=0, ret=“OB_SUCCESS”, exec_ctx.get_errcode()=-6005, exec_ctx.get_errcode()=“OB_TRY_LOCK_ROW_CONFLICT”, session={this:0x7f32185d37d8, id:3221947228, deser:false, tenant:“ZABBIX”, tenant_id:1008, effective_tenant:“ZABBIX”, effective_tenant_id:1008, database:“nd_zabbix”, user:“a_zabbix@%”, consistency_level:3, session_state:2, autocommit:true, tx:0x7f3719bde8e0}, desc=NULL, is_rollback=true, conflict_txs=[])
observer.log.20250506141059716-[2025-05-06 14:09:58.034919] INFO [ARCHIVE] submit_residual_log_fetch_task_ (ob_archive_fetcher.cpp:986) [46964][T1108_ArcFetche][T1108][YB42AC132334-00062360917453CB-0-0] [lt=0] submit residual log fetch task succ(&task=0x7f46101125c0)
observer.log.20250506141059716-[2025-05-06 14:09:58.034934] INFO [ARCHIVE] handle_single_task_ (ob_archive_fetcher.cpp:339) [46964][T1108_ArcFetche][T1108][YB42AC132334-00062360917453CB-0-0] [lt=14] handle task succ(id={id:1001})
observer.log.20250506141059716-[2025-05-06 14:09:58.035272] INFO [STORAGE.TRANS] rollback (ob_memtable_context.cpp:847) [193789][T1008_L0_G0][T1008][YB42AC132334-00062510364525D4-0-0] [lt=0] memtable handle rollback to successfuly(from_seq_no=1746511798034773, to_seq_no=1746511798034457, *this={ObIMvccCtx={alloc_type=0 ctx_descriptor=0 min_table_version=1708482613682432 max_table_version=1708482613682432 trans_version={val:4611686018427387903, v:0} commit_version={val:0, v:0} lock_wait_start_ts=1746511798033490 replay_compact_version={val:0, v:0}} end_code=0 tx_status=0 is_readonly=false ref=0 trans_id={txid:31338022547} ls_id=1001 callback_alloc_count=0 callback_free_count=0 checksum=0 tmp_checksum=0 checksum_scn={val:0, v:0} redo_filled_count=0 redo_sync_succ_count=0 redo_sync_fail_count=0 main_list_length=0 unsynced_cnt=0 unsubmitted_cnt_=0 cb_statistics:[main=2, slave=0, merge=0, tx_end=0, rollback_to=2, fast_commit=0, remove_memtable=0]})
observer.log.20250506141059716-[2025-05-06 14:09:58.035372] INFO [MDS] for_each_ls_in_tenant (mds_tenant_service.cpp:325) [40128][T1079_Occam][T1079][YB42AC132334-00062500A3BB9E12-0-0] [lt=114] for each ls(succ_num=1, ret=0, ret=“OB_SUCCESS”)

3 个赞

社区版

3 个赞

日志连接私发你了

3 个赞

–根据时间和执行语句查询trace_id
select query_sql,svr_ip,TRACE_ID,client_ip,TENANT_NAME,user_name,DB_NAME,ELAPSED_TIME,RET_CODE,FROM_UNIXTIME(ROUND(REQUEST_TIME/1000/1000),’%Y-%m-%d %H:%i:%S’) from GV$OB_SQL_AUDIT
WHERE REQUEST_TIME>=‘2024-04-05 14:34:00’ and lower(query_sql) like ‘%select%’;

3 个赞

你直接日志文件私发给我 我们没有办法下载网盘或者其他的连接的文件

3 个赞

日志无法上传,你从那个网盘连接里面下载也是一样的,日志名称:zabbix

3 个赞

主要我们没有办法从外部下载 所以才让你发的 如果日志过大 可以压缩一下 目前社区可以上传50M

3 个赞

之前序风说开了安全网关就可以了

3 个赞

我们安全管控,传不上去

3 个赞

这个查了

3 个赞

可以通过下面的语句查一下 试试
select
/+DYNAMIC_SAMPLING(1)/
concat(‘kill ‘,c.svr_ip,’ /’,c.svr_ip,’:2881/’) as kill_sid,
b.database_name,
b.table_name,
a.svr_ip,
case when block = 0 then ‘持有锁会话’ ELSE ‘等待锁会话’ END AS block,
c.id,
a.type,
a.trans_id,
a.id2,
a.lmode,
a.CTIME / 1000000 as CTIME,
c.info,
c.command,
c.time,
c.total_time,
c.user_client_ip,
c.retry_cnt,
c.retry_info,
c.thread_id,
c.trace_id
from
gv$ob_locks a,
oceanbase.cdb_ob_table_locations b,
gv$ob_processlist c
where
a.id1 = b.tablet_id
and a.tenant_id = b.tenant_id
and c.trans_id = a.trans_id
and b.tenant_id = 租户id
and b.role = ‘LEADER’
and id2 is not null;

select * from __all_virtual_processlist where info like ‘%语句信息%’ and total_time > 100 order by total_time limit 1\G

3 个赞


现在要回查当时的语句,这个是现在的情况吧

1 个赞

看着这些信息都是行锁

1 个赞

你可以查询一下这个信息 看看有没有 一般情况下 有死锁会有6005的错误

select * from __all_virtual_processlist where retry_info like ‘%-6005%’ and total_time > 100 order by total_time limit 10\G

1 个赞

单从我的后台日志看有6005的报错,确实是有死锁的信息是吧, 但是根据gv$ob_sql_audit查不到具体的sql 这个是为什么? __all_virtual_processlist 只能实时看

1 个赞

应该是有死锁 如果是历史的 你可以通过下面的信息 查看一下 一个是当前的业务租户 一个是sys租户下
select * from oceanbase.DBA_OB_DEADLOCK_EVENT_HISTORY;

select * from oceanbase.CDB_OB_DEADLOCK_EVENT_HISTORY;


这个只有0502日的,没有昨天的,昨天下午14点多后台有不少6005的信息

你sys租户信息 查一下这个信息
select * from __all_virtual_sys_parameter_stat where name like ‘_lcl_op_interval’;