并发很多时的连接异常问题分析,需要从 OBProxy.log(可能还要包含历史文件)分析出一个完整的会话从创建到断开的日志,就能找到报错的线索。如果异常类型有多个,可能会找出多个特点不同的会话日志。假设这里碰到的只有一种异常类型的。
找出一个完整的会话日志需要了解 OBProxy记录日志的几个关键的 ID : cs_id, ss_id,server_sessid,sm_id 等。
从这个日志里我提炼了一个完整会话的日志。
[2023-10-25 14:37:42.403170] INFO [PROXY.NET] ob_unix_net_vconnection.cpp:1255 [183779][Y0-00007F3EC3F20FA0] [lt=20] [dc=0] vip connect(protocol=6, fd=1018, vid=0, vaddr={10.129.180.91:2883}, caddr={10.129.180.91:52366}, daddr={10.129.180.91:2883})
[2023-10-25 14:37:42.403190] INFO [PROXY.CS] ob_mysql_client_session.cpp:370 [183779][Y0-00007F3EC3F20FA0] [lt=16] [dc=0] client session born(cs_id=1510386, proxy_sessid=0, is_local_connection=false, client_vc=0x7f3ea740f260, client_fd=1018, client_addr="10.129.180.91:52366")
[2023-10-25 14:37:42.403222] INFO [PROXY.CS] ob_mysql_client_session.cpp:241 [183779][Y0-00007F3EC3F20FA0] [lt=21] [dc=0] Starting new transaction using sm(cs_id=1510386, get_transact_count()=0, sm_id=591757)
[2023-10-25 14:37:42.405610] INFO [PROXY.TXN] ob_mysql_transact.cpp:5761 [183779][Y0-00007F3EC3F20FA0] [lt=108] [dc=0] [ObMysqlTransact::handle_server_connection_break](client_ip={10.129.180.91:52366}, server_ip={10.129.180.93:2881}, cs_id=1510386, proxy_sessid=0, ss_id=0, server_sessid=0, sm_id=591757, proxy_user_name=root@tpcc_tenant#obtest, database_name=, server_state="CONNECT_ERROR", request_cmd="Sleep", sql_cmd="Handshake", sql=OB_MYSQL_COM_LOGIN)
[2023-10-25 14:37:42.405660] WARN [PROXY.SM] setup_error_transfer (ob_mysql_sm.cpp:9009) [183779][Y0-00007F3EC3F20FA0] [lt=11] [dc=0] [setup_error_transfer] Now closing connection(sm_id=591757, client_ip={10.129.180.91:52366}, request_cmd="Sleep", sql_cmd="Handshake", sql=OB_MYSQL_COM_LOGIN)
[2023-10-25 14:37:42.405675] WARN [PROXY.NET] connect_internal (ob_unix_net_processor.cpp:326) [183779][Y0-00007F3EC3F20FA0] [lt=14] [dc=0] fail to connect_up, vc will be free(ret=-10024)
[2023-10-25 14:37:42.405859] INFO [PROXY.SS] ob_mysql_client_session.cpp:652 [183779][Y0-00007F3EC3F20FA0] [lt=32] [dc=0] client session do_io_close((*this={this:0x7f3ebdb102b0, is_proxy_mysql_client:false, is_waiting_trans_first_request:false, need_delete_cluster:false, is_first_dml_sql_got:false, vc_ready_killed:false, active:true, magic:19132429, conn_decrease:true, current_tid:183779, cs_id:1510386, proxy_sessid:0, session_info:{is_inited:true, priv_info:{has_all_privilege:false, cs_id:1510386, user_priv_set:-1, cluster_name:"obtest", tenant_name:"tpcc_tenant", user_name:"root"}, version:{common_hot_sys_var_version:0, common_sys_var_version:0, mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:0, sys_var_version:0, user_var_version:0, db_name_version:0, last_insert_id_version:0, sess_info_version:0}, hash_version:{common_hot_sys_var_version:0, common_sys_var_version:0, mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:0, sys_var_version:0, user_var_version:0, db_name_version:0, last_insert_id_version:0, sess_info_version:0}, val_hash:{common_hot_sys_var_hash:0, common_cold_sys_var_hash:0, mysql_hot_sys_var_hash:0, mysql_cold_sys_var_hash:0, hot_sys_var_hash:0, cold_sys_var_hash:0, user_var_hash:0}, global_vars_version:-1, is_global_vars_changed:false, is_trans_specified:false, is_user_idc_name_set:false, is_read_consistency_set:false, idc_name:"", cluster_id:0, real_meta_cluster_name:"", safe_read_snapshot:0, syncing_safe_read_snapshot:0, route_policy:1, proxy_route_policy:3, user_identity:0, global_vars_version:-1, is_read_only_user:false, is_request_follower_user:false, obproxy_force_parallel_query_dop:1, ob20_request:{remain_payload_len:0, ob20_request_received_done:false, ob20_header:{ob 20 protocol header:{compressed_len:0, seq:0, non_compressed_len:0}, magic_num:0, header_checksum:0, connection_id:0, request_id:0, pkt_seq:0, payload_len:0, version:0, flag_.flags:0, reserved:0}}, client_cap:0, server_cap:0, last_server_addr:{*Not IP address [0]*:0}, last_server_sess_id:0, init_sql:""}, dummy_ldc:{use_ldc:false, idc_name:"", item_count:3, site_start_index_array:[[0]0, [1]3, [2]3, [3]3], item_array:[[0]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.129.180.93:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:false}, [1]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.129.180.92:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:false}, [2]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.129.180.94:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:false}], pl:null, ts:0x7f3ee57960f0, readonly_exist_status:"READONLY_ZONE_NOT_EXIST"}, dummy_entry:0x7f3ee582b8e0, server_state_version:2, cur_ss:null, bound_ss:null, lii_ss:null, cluster_resource:{this:0x7f3ec0482080, ref_count:638, is_inited:true, cluster_info_key:{cluster_name:{config_string:"obtest"}, cluster_id:0}, cr_state:"CR_AVAIL", version:2, last_access_time_ns:1698215862403790061, deleting_completed_thread_num:0, fetch_rslist_task_count:0, fetch_idc_list_task_count:0, last_idc_list_refresh_time_ns:1698144071872256908, last_rslist_refresh_time_ns:1698196611731834752, server_state_version:2}, client_vc:0x7f3ea740f260, using_ldg:false, trace_stats:NULL}, client_vc_=0x7f3ea740f260, this=0x7f3ebdb102b0)
[2023-10-25 14:37:42.406044] INFO [PROXY.SM] ob_mysql_sm.cpp:9653 [183779][Y0-00007F3EC3F20FA0] [lt=8] [dc=0] deallocating sm(sm_id=591757)
这段日志明确的提示了是 obproxy 后端会话连接(login)失败。具体原因还需要到后端 OBServer节点里去查看日志。推测是压力大了,后端处理不过来。
可以继续看日志,也可以看看OB 负载。
整个日志分析的关键就是在一个或多个 obproxy.log文件里找到一条完整的会话日志。 详细方法可以参考: