数据入库途中断连,后面重启失败

数据入库的途中卡死,尝试重启失败
[2025-11-25 17:38:19.212] [ERROR] Traceback (most recent call last):
[2025-11-25 17:38:19.212] [ERROR] File “core.py”, line 2262, in start_cluster
[2025-11-25 17:38:19.212] [ERROR] File “core.py”, line 2333, in _start_cluster
[2025-11-25 17:38:19.212] [ERROR] File “core.py”, line 256, in run_workflow
[2025-11-25 17:38:19.212] [ERROR] File “core.py”, line 298, in run_plugin_template
[2025-11-25 17:38:19.212] [ERROR] File “core.py”, line 347, in call_plugin
[2025-11-25 17:38:19.212] [ERROR] File “_plugin.py”, line 348, in call
[2025-11-25 17:38:19.212] [ERROR] File “_plugin.py”, line 304, in _new_func
[2025-11-25 17:38:19.212] [ERROR] File “/root/.obd/plugins/oceanbase-ce/3.1.0/connect.py”, line 68, in connect
[2025-11-25 17:38:19.212] [ERROR] cursor = Cursor(ip=server.ip, port=server_config.get(‘mysql_port’, 2881), tenant=’’, password=****** if password is not None else ‘’, stdio=stdio)
[2025-11-25 17:38:19.212] [ERROR] File “_stdio.py”, line 1153, in wrapper
[2025-11-25 17:38:19.212] [ERROR] File “_stdio.py”, line 1114, in func_wrapper
[2025-11-25 17:38:19.212] [ERROR] File “tool.py”, line 787, in init
[2025-11-25 17:38:19.213] [ERROR] File “tool.py”, line 821, in _connect
[2025-11-25 17:38:19.213] [ERROR] File “pymysql/connections.py”, line 353, in init
[2025-11-25 17:38:19.213] [ERROR] File “pymysql/connections.py”, line 633, in connect
[2025-11-25 17:38:19.213] [ERROR] File “pymysql/connections.py”, line 907, in _request_authentication
[2025-11-25 17:38:19.213] [ERROR] File “pymysql/connections.py”, line 725, in _read_packet
[2025-11-25 17:38:19.213] [ERROR] File “pymysql/protocol.py”, line 221, in raise_for_error
[2025-11-25 17:38:19.213] [ERROR] File “pymysql/err.py”, line 143, in raise_mysql_exception
[2025-11-25 17:38:19.213] [ERROR] pymysql.err.OperationalError: (8001, ‘Server is initializing’)
[2025-11-25 17:38:19.213] [ERROR]
[2025-11-25 17:38:22.264] [ERROR] [ERROR] OBD-1006: Failed to connect to oceanbase-ce
[2025-11-25 17:38:22.264] [DEBUG] - sub connect ref count to 0
[2025-11-25 17:38:22.264] [DEBUG] - export connect
[2025-11-25 17:38:22.264] [DEBUG] - plugin oceanbase-ce-py_script_connect-3.1.0 result: False
[2025-11-25 17:38:22.271] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 11
[2025-11-25 17:38:22.272] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 10
[2025-11-25 17:38:22.272] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 9
[2025-11-25 17:38:22.272] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 8
[2025-11-25 17:38:22.272] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 7
[2025-11-25 17:38:22.272] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 6
[2025-11-25 17:38:22.272] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 5
[2025-11-25 17:38:22.272] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 4
[2025-11-25 17:38:22.272] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 3
[2025-11-25 17:38:22.272] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 2
[2025-11-25 17:38:22.272] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 1
[2025-11-25 17:38:22.272] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 0
[2025-11-25 17:38:22.273] [DEBUG] - unlock /root/.obd/lock/mirror_and_repo
[2025-11-25 17:38:22.273] [DEBUG] - exclusive lock /root/.obd/lock/deploy_obtest release, count 0
[2025-11-25 17:38:22.273] [DEBUG] - unlock /root/.obd/lock/deploy_obtest
[2025-11-25 17:38:22.273] [DEBUG] - share lock /root/.obd/lock/global release, count 0
[2025-11-25 17:38:22.273] [DEBUG] - unlock /root/.obd/lock/global
[2025-11-25 17:38:22.273] [INFO] See https://www.oceanbase.com/product/ob-deployer/error-codes .
[2025-11-25 17:38:22.273] [INFO] Trace ID: bfffdd2e-c9e1-11f0-afcd-000c29da1479
[2025-11-25 17:38:22.273] [INFO] If you want to view detailed obd logs, please run: obd display-trace bfffdd2e-c9e1-11f0-afcd-000c29da1479

附上observer.log

[2025-11-25 18:10:41.393713] INFO [COMMON] record_io_error (ob_io_struct.cpp:3672) [2035136][IO_SYNC_CH0][T0][Y0-0000000000000000-0-0] [lt=0] ignore fault detect for sync io(req={is_inited_:true, tenant_id_:1, control_block_:0x7fb6a350a310, ref_cnt_:2, raw_buf_:0x7fb66c79c3c0, fd_:{first_id:-1, second_id:112, third_id:-1, fd_id:-1, slot_version:-1, device_handle:0x7fb6a5910080}, is_object_device_req():false, trace_id_:Y0-0000000000000000-0-0, retry_count_:0, tenant_io_mgr_:{ptr:0x7fb68f7f4030}, storage_accesser:{ptr:null}, io_result_:{is_inited_:true, is_finished_:true, is_canceled_:false, has_estimated_:false, complete_size_:0, offset_:0, size_:4096, timeout_us_:300000000, result_ref_cnt_:2, out_ref_cnt_:1, flag_:{mode:“WRITE”, group_id_:0, func_type_:12, wait_event_id_:16, is_sync_:true, is_unlimited_:false, is_detect_:false, is_write_through_:false, is_sealed_:true, is_time_detect_:false, need_close_dev_and_fd_:false, reserved_:0}, ret_code_:{io_ret_:-4009, fs_errno_:0}, tenant_id_:1, tenant_io_mgr_:{ptr:0x7fb68f7f4030}, user_data_buf_:null, buf_:0x7fb67e405000, io_callback_:null, time_log:{begin_ts:1764065441393614, enqueue_used:2, dequeue_used:12, submit_used:25, return_used:-1, callback_enqueue_used:-1, callback_dequeue_used:-1, callback_finish_used:-1, end_used:1764065441393713}}, part_id:-1})
[2025-11-25 18:10:41.393797] EDIAG [PALF] inner_write_impl_ (log_block_handler.cpp:444) [2035402][T1_IOWorker][T1][Y0-0000000000000000-0-0] [lt=0][errcode=-4009] io_adapter pwrite failed(ret=-4009, io_fd={first_id:-1, second_id:112, third_id:-1, fd_id:-1, slot_version:-1, device_handle:0x7fb6a5910080}, offset=0, count=4096, write_size=0) BACKTRACE:0xa928418 0xa63f135 0xa63e145 0xa63dbc4 0xa63dafb 0xa63d910 0x10066713 0xa636c17 0x100693b5 0x101d6ea5 0xa635f6b 0xa635a31 0xa6354ff 0xa633bbf 0x100f0adc 0x25aff205 0x25afd352 0x7fb6bf4f0d2b 0x7fb6bf430b70
[2025-11-25 18:10:41.393857] ERROR [USING_LOG_PREFIX] inner_write_impl_ (log_block_handler.cpp:445) [2035402][T1_IOWorker][T1][Y0-0000000000000000-0-0] [lt=11][errcode=-4009] io_adapter pwrite failed, please check the output of dmesg
[2025-11-25 18:10:41.394282] INFO [COMMON] replace_fragment_node (ob_kvcache_map.cpp:763) [2035113][TimerWK3_KVCacheRep][T0][Y0-0000000000000000-0-0] [lt=25] Cache replace map node details(ret=0, replace_node_count=0, replace_time=0, replace_start_pos=1132416, replace_num=15728)
[2025-11-25 18:10:41.394317] INFO [COMMON] replace_map (ob_kv_storecache.cpp:694) [2035113][TimerWK3_KVCacheRep][T0][Y0-0000000000000000-0-0] [lt=34] replace map num details(ret=0, replace_node_count=0, map_once_replace_num_=15728, map_replace_skip_count_=10)
[2025-11-25 18:10:41.403452] WDIAG [STORAGE.TRANS] gen_trans_id (ob_trans_service_v4.cpp:2568) [2035368][T1_DRService][T1][YB42C0A80A63-00064466155F3938-0-0] [lt=16][errcode=-4383] get trans id not ready(ret=-4383, retry_times=52, this={is_inited_:true, tenant_id_:1, this:0x7fb68be04030})
[2025-11-25 18:10:41.403479] WDIAG [STORAGE.TRANS] add (ob_trans_define_v4.cpp:1753) [2035368][T1_DRService][T1][YB42C0A80A63-00064466155F3938-0-0] [lt=26][errcode=-4383] fail to exec tx_id_allocator_(tx_id)(ret=-4383)
[2025-11-25 18:10:41.403489] WDIAG [STORAGE.TRANS] start_tx (ob_tx_api.cpp:298) [2035368][T1_DRService][T1][YB42C0A80A63-00064466155F3938-0-0] [lt=8][errcode=-4383] add tx to txMgr fail(ret=-4383, tx={this:0x7fb66efd8040, tx_id:{txid:0}, state:1, addr:“192.168.10.99:2882”, tenant_id:1, session_id:3221586692, assoc_session_id:3221586692, client_sid:3221586692, xid:NULL, xa_mode:"", xa_start_addr:“0.0.0.0:0”, access_mode:-1, tx_consistency_type:0, isolation:1, snapshot_version:{val:18446744073709551615, v:3}, snapshot_scn:0, active_scn:0, op_sn:2, alloc_ts:1764065441348686, active_ts:-1, commit_ts:-1, finish_ts:-1, timeout_us:-1, lock_timeout_us:-1, expire_ts:9223372036854775807, coord_id:{id:-1}, parts:[], exec_info_reap_ts:0, commit_version:{val:18446744073709551615, v:3}, commit_times:0, commit_cb:null, cluster_id:-1, cluster_version:17180067075, seq_base:1764065441246420, flags_.SHADOW:false, flags_.INTERRUPTED:false, flags_.BLOCK:false, flags_.REPLICA:false, conflict_txs:[], abort_cause:0, commit_expire_ts:0, commit_task_.is_registered():false, modified_tables:[], last_rc_snapshot_version:{val:0, v:0}, ref:1})
[2025-11-25 18:10:41.403531] WDIAG [STORAGE.TRANS] start_tx (ob_tx_api.cpp:322) [2035368][T1_DRService][T1][YB42C0A80A63-00064466155F3938-0-0] [lt=41][errcode=-4383] start tx failed(ret=-4383, tx={this:0x7fb66efd8040, tx_id:{txid:0}, state:1, addr:“192.168.10.99:2882”, tenant_id:1, session_id:3221586692, assoc_session_id:3221586692, client_sid:3221586692, xid:NULL, xa_mode:"", xa_start_addr:“0.0.0.0:0”, access_mode:-1, tx_consistency_type:0, isolation:1, snapshot_version:{val:18446744073709551615, v:3}, snapshot_scn:0, active_scn:0, op_sn:2, alloc_ts:1764065441348686, active_ts:-1, commit_ts:-1, finish_ts:-1, timeout_us:-1, lock_timeout_us:-1, expire_ts:9223372036854775807, coord_id:{id:-1}, parts:[], exec_info_reap_ts:0, commit_version:{val:18446744073709551615, v:3}, commit_times:0, commit_cb:null, cluster_id:-1, cluster_version:17180067075, seq_base:1764065441246420, flags_.SHADOW:false, flags_.INTERRUPTED:false, flags_.BLOCK:false, flags_.REPLICA:false, conflict_txs:[], abort_cause:0, commit_expire_ts:0, commit_task_.is_registered():false, modified_tables:[], last_rc_snapshot_version:{val:0, v:0}, ref:1})
[2025-11-25 18:10:41.403563] WDIAG [SQL.EXE] explicit_start_trans (ob_sql_trans_control.cpp:173) [2035368][T1_DRService][T1][YB42C0A80A63-00064466155F3938-0-0] [lt=32][errcode=-4383] fail to exec txs->start_tx(*session->get_tx_desc(), tx_param)(ret=-4383, tx_param={cluster_id:1754276247, timeout_us:29999966, lock_timeout_us:-1, access_mode:0, isolation:1})
[2025-11-25 18:10:41.403577] WDIAG [SQL.ENG] start_trans (ob_tcl_executor.cpp:112) [2035368][T1_DRService][T1][YB42C0A80A63-00064466155F3938-0-0] [lt=10][errcode=-4383] fail start trans(ret=-4383)
[2025-11-25 18:10:41.403598] WDIAG [SQL] open_cmd (ob_result_set.cpp:86) [2035368][T1_DRService][T1][YB42C0A80A63-00064466155F3938-0-0] [lt=11][errcode=-4383] execute cmd failed(ret=-4383)
[2025-11-25 18:10:41.403606] WDIAG [SQL] open (ob_result_set.cpp:152) [2035368][T1_DRService][T1][YB42C0A80A63-00064466155F3938-0-0] [lt=7][errcode=-4383] execute plan failed(ret=-4383)
[2025-11-25 18:10:41.403612] WDIAG [SERVER] open (ob_inner_sql_result.cpp:159) [2035368][T1_DRService][T1][YB42C0A80A63-00064466155F3938-0-0] [lt=5][errcode=-4383] open result set failed(ret=-4383)
[2025-11-25 18:10:41.403619] WDIAG [SERVER] do_query (ob_inner_sql_connection.cpp:801) [2035368][T1_DRService][T1][YB42C0A80A63-00064466155F3938-0-0] [lt=4][errcode=-4383] result set open failed(ret=-4383, executor={ObIExecutor:, sql:“START TRANSACTION”})
[2025-11-25 18:10:41.403632] WDIAG [SERVER] query (ob_inner_sql_connection.cpp:946) [2035368][T1_DRService][T1][YB42C0A80A63-00064466155F3938-0-0] [lt=11][errcode=-4383] execute failed(ret=-4383, tenant_id=1, executor={ObIExecutor:, sql:“START TRANSACTION”}, retry_cnt=83, local_sys_schema_version=1, local_tenant_schema_version=1)
[2025-11-25 18:10:41.403646] INFO [SERVER] sleep_before_local_retry (ob_query_retry_ctrl.cpp:88) [2035368][T1_DRService][T1][YB42C0A80A63-00064466155F3938-0-0] [lt=12] will sleep(sleep_us=83000, remain_us=21991765, base_sleep_us=1000, retry_sleep_type=1, v.stmt_retry_times_=83, v.err_=-4383, timeout_timestamp=1764065463395410)
[2025-11-25 18:10:41.404055] INFO [COMMON] record_io_error (ob_io_struct.cpp:3672) [2035136][IO_SYNC_CH0][T0][Y0-0000000000000000-0-0] [lt=0] ignore fault detect for sync io(req={is_inited_:true, tenant_id_:1, control_block_:0x7fb6a350a358, ref_cnt_:2, raw_buf_:0x7fb66c79c3c0, fd_:{first_id:-1, second_id:112, third_id:-1, fd_id:-1, slot_version:-1, device_handle:0x7fb6a5910080}, is_object_device_req():false, trace_id_:Y0-0000000000000000-0-0, retry_count_:0, tenant_io_mgr_:{ptr:0x7fb68f7f4030}, storage_accesser:{ptr:null}, io_result_:{is_inited_:true, is_finished_:true, is_canceled_:false, has_estimated_:false, complete_size_:0, offset_:0, size_:4096, timeout_us_:300000000, result_ref_cnt_:2, out_ref_cnt_:1, flag_:{mode:“WRITE”, group_id_:0, func_type_:12, wait_event_id_:16, is_sync_:true, is_unlimited_:false, is_detect_:false, is_write_through_:false, is_sealed_:true, is_time_detect_:false, need_close_dev_and_fd_:false, reserved_:0}, ret_code_:{io_ret_:-4009, fs_errno_:0}, tenant_id_:1, tenant_io_mgr_:{ptr:0x7fb68f7f4030}, user_data_buf_:null, buf_:0x7fb67e405000, io_callback_:null, time_log:{begin_ts:1764065441403954, enqueue_used:2, dequeue_used:19, submit_used:16, return_used:-1, callback_enqueue_used:-1, callback_dequeue_used:-1, callback_finish_used:-1, end_used:1764065441404055}}, part_id:-1})
[2025-11-25 18:10:41.405130] INFO [COMMON] try_inc_thread_count (ob_dynamic_thread_pool.cpp:510) [2035389][T1_TimerWK2][T1][Y0-0000000000000000-0-0] [lt=23] try inc thread count(*this={name:TimerWK, this:0x7fb6a35fe3b0, min_thread_cnt:4, max_thread_cnt:128, running_thread_cnt:4, threads_idle_time:31121923449, tenant_id:1}, cur_thread_count=4, cnt=1, new_thread_count=5)
[2025-11-25 18:10:41.405346] INFO [COMMON] try_inc_thread_count (ob_dynamic_thread_pool.cpp:515) [2035389][T1_TimerWK2][T1][Y0-0000000000000000-0-0] [lt=27] inc thread count(*this={name:TimerWK, this:0x7fb6a35fe3b0, min_thread_cnt:4, max_thread_cnt:128, running_thread_cnt:2, threads_idle_time:31121923596, tenant_id:1}, cur_thread_count=4, cnt=1, new_thread_count=5)
[2025-11-25 18:10:41.405346] INFO [SHARE] pre_run (ob_tenant_base.cpp:375) [2075251][][T1][Y0-0000000000000000-0-0] [lt=0] tenant thread pre_run(ret=0, thread_count_=155, id_=1, GET_GROUP_ID()=0)

可能是io异常问题。建议先查看一下主机message日志看看是否存在坏盘问题

1 个赞

这是个虚拟机节点,应该没有这个问题才对,还有其他方案吗