OB集群分区表主键查询超时

【 使用环境 】 测试环境
【 OB or 其他组件 】oceanbase
【 使用版本 】4.2.5.5
【问题描述】

3节点集群,ob_query_timout 10s 默认值 当前集群的clog和data盘在同一个sata的ssd上

业务分区表主键查询: select id ,p_name,… from t1 where id = ?
当ob_trx_lock_time 值较小时比如500000以下(500ms以下), 报 ERROR 4012 (HY000) statement query timeout 看整体日志事务执行时间没看到超过10s 。

当ob_trx_lock_time 值设置大于10000000以上(10s以上)正常 这点很奇怪 snapshot read 理论上跟这个参数无关

同时observer的日志信息中出现下列日志:
[2025-10-13 14:57:04.508831] WDIAG [STORAGE.TRANS] run1 (ob_standby_timestamp_service.cpp:141) [2452124][T1006_STSWorker][T1006][Y0-0000000000000000-0-0] [lt=29][errcode=-4076] query and update last id fail(ret=-4076, ret=“OB_NEED_WAIT”)
[2025-10-13 14:57:04.512678] INFO [STORAGE.TRANS] check_all_readonly_tx_clean_up (ob_ls_tx_service.cpp:370) [2451405][T1004_TxLoopWor][T1004][Y0-0000000000000000-0-0] [lt=2] wait_all_readonly_tx_cleaned_up cleaned up success(ls_id={id:1})
[2025-10-13 14:57:04.512706] INFO [STORAGE.TRANS] check_all_readonly_tx_clean_up (ob_ls_tx_service.cpp:370) [2451405][T1004_TxLoopWor][T1004][Y0-0000000000000000-0-0] [lt=25] wait_all_readonly_tx_cleaned_up cleaned up success(ls_id={id:1001})
[2025-10-13 14:57:04.512712] INFO [STORAGE.TRANS] check_all_readonly_tx_clean_up (ob_ls_tx_service.cpp:370) [2451405][T1004_TxLoopWor][T1004][Y0-0000000000000000-0-0] [lt=3] wait_all_readonly_tx_cleaned_up cleaned up success(ls_id={id:1002})
[2025-10-13 14:57:04.512718] INFO [STORAGE.TRANS] check_all_readonly_tx_clean_up (ob_ls_tx_service.cpp:370) [2451405][T1004_TxLoopWor][T1004][Y0-0000000000000000-0-0] [lt=2] wait_all_readonly_tx_cleaned_up cleaned up success(ls_id={id:1003})
[2025-10-13 14:57:04.513022] WDIAG [SQL.DAS] process_remote_task_resp (ob_das_ref.cpp:594) [2451438][T1004_L0_G0][T1004][YB420A0A06BF-000640F0BF896B2D-0-0] [lt=8][errcode=0] das async execution failed(task_resp={has_more:false, ctrl_svr:“192.168.1.101:2882”, runner_svr:“192.168.1.103:2882”, op_results:[{ObIDASTaskResult:{task_id:5145780107}, datum_store:{tenant_id:1004, label:“DASScanResult”, ctx_id:0, mem_limit:-1, row_cnt:0, file_size:0, enable_dump:false}, output_exprs:NULL, io_read_bytes:0, ssstore_read_bytes:0, ssstore_read_row_cnt:0, memstore_read_row_cnt:0}], rcode:{code:-6004, msg:"", warnings:[]}, trans_result:{incomplete:false, parts:[], touched_ls_list:[], conflict_txs:[], conflict_info_array:[]}})
[2025-10-13 14:57:04.513084] WDIAG log_user_error_and_warn (ob_das_utils.cpp:36) [2451438][T1004_L0_G0][T1004][YB420A0A06BF-000640F0BF896B2D-0-0] [lt=58][errcode=-6004]
[2025-10-13 14:57:04.513092] WDIAG [SQL.DAS] process_task_resp (ob_data_access_service.cpp:714) [2451438][T1004_L0_G0][T1004][YB420A0A06BF-000640F0BF896B2D-0-0] [lt=7][errcode=-6004] error occurring in remote das task, please use the current TRACE_ID to grep the original error message on the remote_addr.(ret=-6004, remote_addr=“192.168.1.103:2882”)
[2025-10-13 14:57:04.513102] WDIAG [SQL.DAS] process_remote_task_resp (ob_das_ref.cpp:601) [2451438][T1004_L0_G0][T1004][YB420A0A06BF-000640F0BF896B2D-0-0] [lt=7][errcode=-6004] failed to process das async task resp(ret=-6004, task_resp={has_more:false, ctrl_svr:“192.168.1.101:2882”, runner_svr:“192.168.1.103:2882”, op_results:[{ObIDASTaskResult:{task_id:5145780107}, datum_store:{tenant_id:1004, label:“DASScanResult”, ctx_id:0, mem_limit:-1, row_cnt:0, file_size:0, enable_dump:false}, output_exprs:NULL, io_read_bytes:0, ssstore_read_bytes:0, ssstore_read_row_cnt:0, memstore_read_row_cnt:0}], rcode:{code:-6004, msg:"", warnings:[]}, trans_result:{incomplete:false, parts:[], touched_ls_list:[], conflict_txs:[], conflict_info_array:[]}})
[2025-10-13 14:57:04.513112] WDIAG [SQL.DAS] wait_tasks_and_process_response (ob_das_ref.cpp:537) [2451438][T1004_L0_G0][T1004][YB420A0A06BF-000640F0BF896B2D-0-0] [lt=9][errcode=0] failed to process remote task resp(tmp_ret=-6004)
[2025-10-13 14:57:04.513116] WDIAG [SQL.DAS] execute_all_task (ob_das_ref.cpp:404) [2451438][T1004_L0_G0][T1004][YB420A0A06BF-000640F0BF896B2D-0-0] [lt=4][errcode=0] failed to process all async remote tasks(ret=0)
[2025-10-13 14:57:04.513127] WDIAG [SQL.DAS] execute_all_task (ob_das_ref.cpp:322) [2451438][T1004_L0_G0][T1004][YB420A0A06BF-000640F0BF896B2D-0-0] [lt=8][errcode=-6004] fail to execute all agg_tasks(ret=-6004)
[2025-10-13 14:57:04.513136] WDIAG [SQL.DAS] do_table_scan (ob_das_merge_iter.cpp:251) [2451438][T1004_L0_G0][T1004][YB420A0A06BF-000640F0BF896B2D-0-0] [lt=9][errcode=-6004] failed to execute all das task(ret=-6004)
[2025-10-13 14:57:04.513140] WDIAG [SQL.ENG] do_table_scan (ob_table_scan_op.cpp:2259) [2451438][T1004_L0_G0][T1004][YB420A0A06BF-000640F0BF896B2D-0-0] [lt=4][errcode=-6004] execute all das scan task failed(ret=-6004)
[2025-10-13 14:57:04.513144] WDIAG [SQL.ENG] do_init_before_get_row (ob_table_scan_op.cpp:1443) [2451438][T1004_L0_G0][T1004][YB420A0A06BF-000640F0BF896B2D-0-0] [lt=3][errcode=-6004] fail to do table scan(ret=-6004)
[2025-10-13 14:57:04.513148] WDIAG [SQL.ENG] inner_get_next_batch_for_tsc (ob_table_scan_op.cpp:2150) [2451438][T1004_L0_G0][T1004][Y
B420A0A06BF-000640F0BF896B2D-0-0] [lt=2][errcode=-6004] failed to init before get row(ret=-6004)
上述DAS 执行失败了 数据位于远程节点上 看版本的release 升级到4.2.5 BP6能否解决上述问题?

1 个赞

走的是2881端口还是2883端口?

集群方式访问 2883端口

6004报错,一般是存在其他事务操作该表导致的。
麻烦提供一下详细操作流程吧
这边先复现一下看看

ob_trx_lock_timeout参数默认是不生效的不建议修改。应该是拉取103节点的分区时候超时了,预期内的

场景就是并发下有同时修改相同id的事务 ,其他的select事务应该是快照读(非FOR UPDATE/LOCK IN SHARE MODE) mvcc下不应该被阻塞的

因为是环境 有被其他人修改过 默认应该是-1的 所以看到了上述的这个问题 很奇怪 :joy:

显式事务,在等待锁释放使,受min(ob_query_timeout,ob_trx_lock_timeout)影响

让测试重新压了下 通过obdiag去分析 事务里并非单独的快照读 业务的处理逻辑有问题 有写锁的等待on_wlock_retry

1 个赞

谢谢分享