OceanBase白屏集群部署一直卡在 Initialize oceanbase-ce 不动,反复执行一个sql语句

【 使用环境 】centos7 ,三个observer,
【 OB or 其他组件 】
【 使用版本 】OBD 2.7.0
【问题描述】
一直卡在initialize oceanbase-ce observer.log 写的很快,卡了十分钟写了将近200M,没有ERROR日志

observer.log 部分如下:

[2024-05-04 18:05:02.356840] WDIAG [SQL.RESV] resolve_basic_table (ob_dml_resolver.cpp:15087) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] fail to resolve basic table with cte(ret=-5019)
[2024-05-04 18:05:02.356843] WDIAG [SQL.RESV] resolve_table (ob_dml_resolver.cpp:3675) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] resolve basic table failed(ret=-5019)
[2024-05-04 18:05:02.356845] WDIAG [SQL.RESV] resolve_from_clause (ob_select_resolver.cpp:3614) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=1][errcode=-5019] fail to exec resolve_table(*table_node, table_item)(ret=-5019)
[2024-05-04 18:05:02.356848] WDIAG [SQL.RESV] resolve_normal_query (ob_select_resolver.cpp:1076) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] fail to exec resolve_from_clause(parse_tree.children_[PARSE_SELECT_FROM])(ret=-5019)
[2024-05-04 18:05:02.356850] WDIAG [SQL.RESV] resolve (ob_select_resolver.cpp:1278) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] resolve normal query failed(ret=-5019)
[2024-05-04 18:05:02.356853] WDIAG [SQL.RESV] select_stmt_resolver_func (ob_resolver.cpp:187) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] execute stmt_resolver failed(ret=-5019, parse_tree.type_=3299)
[2024-05-04 18:05:02.356860] WDIAG [SQL] generate_stmt (ob_sql.cpp:2840) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=3][errcode=-5019] failed to resolve(ret=-5019)
[2024-05-04 18:05:02.356865] WDIAG [SQL] generate_physical_plan (ob_sql.cpp:2961) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=3][errcode=-5019] Failed to generate stmt(ret=-5019, result.get_exec_context().need_disconnect()=false)
[2024-05-04 18:05:02.356869] WDIAG [SQL] handle_physical_plan (ob_sql.cpp:4811) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=3][errcode=-5019] Failed to generate plan(ret=-5019, result.get_exec_context().need_disconnect()=false)
[2024-05-04 18:05:02.356872] WDIAG [SQL] handle_text_query (ob_sql.cpp:2560) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] fail to handle physical plan(ret=-5019)
[2024-05-04 18:05:02.356875] WDIAG [SQL] stmt_query (ob_sql.cpp:227) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=1][errcode=-5019] fail to handle text query(stmt=SELECT * FROM all_ls_meta_table WHERE tenant_id = 1 ORDER BY tenant_id, ls_id, svr_ip, svr_port, ret=-5019)
[2024-05-04 18:05:02.356878] WDIAG [SERVER] do_query (ob_inner_sql_connection.cpp:686) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=3][errcode=-5019] executor execute failed(ret=-5019)
[2024-05-04 18:05:02.356881] WDIAG [SERVER] query (ob_inner_sql_connection.cpp:834) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] execute failed(ret=-5019, tenant_id=1, executor={ObIExecutor:, sql:"SELECT * FROM all_ls_meta_table WHERE tenant_id = 1 ORDER BY tenant_id, ls_id, svr_ip, svr_port"}, retry_cnt=0, local_sys_schema_version=1, local_tenant_schema_version=1)
[2024-05-04 18:05:02.356889] WDIAG [SERVER] after_func (ob_query_retry_ctrl.cpp:979) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=5][errcode=-5019] [RETRY] check if need retry(v={force_local_retry:true, stmt_retry_times:0, local_retry_times:0, err
:-5019, err
:“OB_TABLE_NOT_EXIST”, retry_type:0, client_ret:-5019}, need_retry=false)
[2024-05-04 18:05:02.356897] WDIAG [SERVER] inner_close (ob_inner_sql_result.cpp:218) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=6][errcode=-5019] result set close failed(ret=-5019)
[2024-05-04 18:05:02.356899] WDIAG [SERVER] force_close (ob_inner_sql_result.cpp:198) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] result set close failed(ret=-5019)
[2024-05-04 18:05:02.356901] WDIAG [SERVER] query (ob_inner_sql_connection.cpp:839) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] failed to close result(close_ret=-5019, ret=-5019)
[2024-05-04 18:05:02.356906] WDIAG [SERVER] query (ob_inner_sql_connection.cpp:869) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] failed to process record(executor={ObIExecutor:, sql:"SELECT * FROM all_ls_meta_table WHERE tenant_id = 1 ORDER BY tenant_id, ls_id, svr_ip, svr_port"}, record_ret=-5019, ret=-5019)
[2024-05-04 18:05:02.356909] WDIAG [SERVER] query (ob_inner_sql_connection.cpp:895) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=3][errcode=-5019] failed to process final(executor={ObIExecutor:, sql:"SELECT * FROM all_ls_meta_table WHERE tenant_id = 1 ORDER BY tenant_id, ls_id, svr_ip, svr_port"}, aret=-5019, ret=-5019)
[2024-05-04 18:05:02.356912] WDIAG [SERVER] execute_read_inner (ob_inner_sql_connection.cpp:1682) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] execute sql failed(ret=-5019, tenant_id=1, sql=SELECT * FROM all_ls_meta_table WHERE tenant_id = 1 ORDER BY tenant_id, ls_id, svr_ip, svr_port)
[2024-05-04 18:05:02.356915] WDIAG [SERVER] retry_while_no_tenant_resource (ob_inner_sql_connection.cpp:952) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] retry_while_no_tenant_resource failed(ret=-5019, tenant_id=1)
[2024-05-04 18:05:02.356917] WDIAG [SERVER] execute_read (ob_inner_sql_connection.cpp:1622) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] execute_read failed(ret=-5019, cluster_id=1714844540, tenant_id=1)
[2024-05-04 18:05:02.356919] WDIAG [COMMON.MYSQLP] read (ob_mysql_proxy.cpp:131) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] query failed(ret=-5019, conn=0x2abd71796050, start=1714817102356628, sql=SELECT * FROM all_ls_meta_table WHERE tenant_id = 1 ORDER BY tenant_id, ls_id, svr_ip, svr_port)
[2024-05-04 18:05:02.356924] WDIAG [COMMON.MYSQLP] read (ob_mysql_proxy.cpp:66) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=4][errcode=-5019] read failed(ret=-5019)
[2024-05-04 18:05:02.356926] WDIAG [SHARE.PT] get_by_tenant (ob_persistent_ls_table.cpp:639) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=1][errcode=-5019] execute sql failed(ret=-5019, ret=“OB_TABLE_NOT_EXIST”, tenant_id=1, sql=SELECT * FROM all_ls_meta_table WHERE tenant_id = 1 ORDER BY tenant_id, ls_id, svr_ip, svr_port)
[2024-05-04 18:05:02.356962] WDIAG [SHARE.PT] get_by_tenant (ob_ls_table_operator.cpp:252) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=3][errcode=-5019] get all ls info by persistent_ls
failed(ret=-5019, ret=“OB_TABLE_NOT_EXIST”, tenant_id=1)
[2024-05-04 18:05:02.356967] WDIAG [SHARE] inner_open
(ob_ls_table_iterator.cpp:104) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=3][errcode=-5019] fail to get ls infos by tenant(ret=-5019, ret=“OB_TABLE_NOT_EXIST”, tenant_id=1, inner_table_only=true)
[2024-05-04 18:05:02.356969] WDIAG [SHARE] next (ob_ls_table_iterator.cpp:71) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] fail to open iterator(ret=-5019, ret=“OB_TABLE_NOT_EXIST”)
[2024-05-04 18:05:02.356972] WDIAG [SERVER] build_replica_map
(ob_tenant_meta_checker.cpp:334) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] ls table iterator next failed(ret=-5019, ret=“OB_TABLE_NOT_EXIST”)
[2024-05-04 18:05:02.356975] WDIAG [SERVER] check_ls_table
(ob_tenant_meta_checker.cpp:214) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=2][errcode=-5019] build replica map from ls table failed(ret=-5019, ret=“OB_TABLE_NOT_EXIST”, mode=1)
[2024-05-04 18:05:02.356979] WDIAG [SERVER] check_ls_table (ob_tenant_meta_checker.cpp:194) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=3][errcode=-5019] check ls table failed(ret=-5019, ret=“OB_TABLE_NOT_EXIST”, mode=1)
[2024-05-04 18:05:02.356982] WDIAG [SERVER] runTimerTask (ob_tenant_meta_checker.cpp:44) [13946][T1_LSMetaCh][T1][YB42C0A8086F-0006179DBD4BA601-0-0] [lt=3][errcode=-5019] fail to check ls meta table(ret=-5019, ret=“OB_TABLE_NOT_EXIST”)
[2024-05-04 18:05:02.357167] INFO [COMMON] compute_tenant_wash_size (ob_kvcache_store.cpp:1127) [13571][KVCacheWash][T0][Y0-0000000000000000-0-0] [lt=57] Wash compute wash size(is_wash_valid=false, sys_total_wash_size=-17786437632, global_cache_size=0, tenant_max_wash_size=0, tenant_min_wash_size=0, tenant_ids
=[cnt:4, 500, 508, 509, 1])
[2024-05-04 18:05:02.443489] WDIAG [STORAGE.TRANS] process_cluster_heartbeat_rpc_cb (ob_tenant_weak_read_service.cpp:451) [13960][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=3][errcode=-4076] tenant weak read service cluster heartbeat RPC fail(ret=-4076, rcode={code:-4076, msg:“post cluster heartbeat rpc failed, tenant_id=1”, warnings:[]}, tenant_id
=1, dst=“192.168.8.111:2882”, cluster_service_tablet_id={id:226})
[2024-05-04 18:05:02.443521] WDIAG [STORAGE.TRANS] do_cluster_heartbeat
(ob_tenant_weak_read_service.cpp:867) [13960][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=30][errcode=-4076] post cluster heartbeat rpc fail(ret=-4076, ret=“OB_NEED_WAIT”, tenant_id
=1, local_server_version={val:18446744073709551615, v:3}, valid_part_count=0, total_part_count=0, generate_timestamp=1714817102443480)
[2024-05-04 18:05:02.443532] WDIAG [STORAGE.TRANS] do_cluster_heartbeat
(ob_tenant_weak_read_service.cpp:877) [13960][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=10][errcode=-4076] tenant weak read service do cluster heartbeat fail(ret=-4076, ret=“OB_NEED_WAIT”, tenant_id
=1, last_post_cluster_heartbeat_tstamp
=1714817102343409, cluster_heartbeat_interval
=1000000, cluster_service_tablet_id={id:226}, cluster_service_master=“0.0.0.0:0”)
[2024-05-04 18:05:02.443549] WDIAG [STORAGE.TRANS] generate_min_weak_read_version (ob_weak_read_util.cpp:83) [13960][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=4][errcode=-4023] get gts cache error(ret=-4023, tenant_id=1)
[2024-05-04 18:05:02.443555] WDIAG [STORAGE.TRANS] generate_server_version (ob_tenant_weak_read_service.cpp:316) [13960][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=6][errcode=-4023] generate min weak read version error(ret=-4023, tenant_id=1)
[2024-05-04 18:05:02.443559] WDIAG [STORAGE.TRANS] generate_tenant_weak_read_timestamp_ (ob_tenant_weak_read_service.cpp:596) [13960][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=3][errcode=-4023] generate server version for tenant fail(ret=-4023, ret=“OB_EAGAIN”, tenant_id=1, index=0x2abd499bff10, server_version_epoch_tstamp_=1714817102443537)
[2024-05-04 18:05:02.477256] INFO [COORDINATOR] detect_recover (ob_failure_detector.cpp:139) [13823][T1_Occam][T1][Y0-0000000000000000-0-0] [lt=29] doing detect recover operation(events_with_ops=[{event:{type:SCHEMA NOT REFRESHED, module:SCHEMA, info:schema not refreshed, level:SERIOUS}}])
[2024-05-04 18:05:02.489348] INFO [MDS] for_each_ls_in_tenant (mds_tenant_service.cpp:237) [13819][T1_Occam][T1][YB42C0A8086F-0006179DBD5BA53C-0-0] [lt=3] for each ls(succ_num=0, ret=0, ret=“OB_SUCCESS”)
[2024-05-04 18:05:02.517843] INFO [COMMON] replace_map (ob_kv_storecache.cpp:746) [13572][KVCacheRep][T0][Y0-0000000000000000-0-0] [lt=27] replace map num details(ret=0, replace_node_count=0, map_once_replace_num_=62914, map_replace_skip_count_=2)
[2024-05-04 18:05:02.523766] INFO [SQL.PC] update_memory_conf (ob_plan_cache.cpp:1918) [13853][T1_PlanCacheEvi][T1][Y0-0000000000000000-0-0] [lt=3] update plan cache memory config(ob_plan_cache_percentage=5, ob_plan_cache_evict_high_percentage=90, ob_plan_cache_evict_low_percentage=50, tenant_id=1)
[2024-05-04 18:05:02.523793] INFO [SQL.PC] cache_evict (ob_plan_cache.cpp:1459) [13853][T1_PlanCacheEvi][T1][Y0-0000000000000000-0-0] [lt=24] start lib cache evict(tenant_id=1, mem_hold=2097152, mem_limit=107374180, cache_obj_num=0, cache_node_num=0)
[2024-05-04 18:05:02.523799] INFO [SQL.PC] cache_evict (ob_plan_cache.cpp:1476) [13853][T1_PlanCacheEvi][T1][Y0-0000000000000000-0-0] [lt=5] end lib cache evict(tenant_id=1, cache_evict_num=0, mem_hold=2097152, mem_limit=107374180, cache_obj_num=0, cache_node_num=0)
[2024-05-04 18:05:02.523803] INFO [SQL.PC] runTimerTask (ob_plan_cache.cpp:2682) [13853][T1_PlanCacheEvi][T1][Y0-0000000000000000-0-0] [lt=3] schedule next cache evict task(evict_interval=5000000)
[2024-05-04 18:05:02.525291] INFO [SQL.PC] dump_all_objs (ob_plan_cache.cpp:2401) [13853][T1_PlanCacheEvi][T1][Y0-0000000000000000-0-0] [lt=3] Dumping All Cache Objs(alloc_obj_list.count()=0, alloc_obj_list=[])
[2024-05-04 18:05:02.525311] INFO [SQL.PC] runTimerTask (ob_plan_cache.cpp:2690) [13853][T1_PlanCacheEvi][T1][Y0-0000000000000000-0-0] [lt=19] schedule next cache evict task(evict_interval=5000000)
[2024-05-04 18:05:02.543670] WDIAG [STORAGE.TRANS] process_cluster_heartbeat_rpc_cb (ob_tenant_weak_read_service.cpp:451) [13960][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=3][errcode=-4076] tenant weak read service cluster heartbeat RPC fail(ret=-4076, rcode={code:-4076, msg:“post cluster heartbeat rpc failed, tenant_id=1”, warnings:[]}, tenant_id_=1, dst=“192.168.8.111:2882”, cluster_service_tablet_id={id:226})
[2024-05-04 18:05:02.543723] WDIAG [STORAGE.TRANS] do_cluster_heartbeat_ (ob_tenant_weak_read_service.cpp:867) [13960][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=36][errcode=-4076] post cluster heartbeat rpc fail(ret=-4076, ret=“OB_NEED_WAIT”, tenant_id_=1, local_server_version={val:18446744073709551615, v:3}, valid_part_count=0, total_part_count=0, generate_timestamp=1714817102543660)
[2024-05-04 18:05:02.543732] WDIAG [STORAGE.TRANS] do_cluster_heartbeat_ (ob_tenant_weak_read_service.cpp:877) [13960][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=9][errcode=-4076] tenant weak read service do cluster heartbeat fail(ret=-4076, ret=“OB_NEED_WAIT”, tenant_id_=1, last_post_cluster_heartbeat_tstamp_=1714817102443537, cluster_heartbeat_interval_=1000000, cluster_service_tablet_id={id:226}, cluster_service_master=“0.0.0.0:0”)
[2024-05-04 18:05:02.543746] WDIAG [STORAGE.TRANS] generate_min_weak_read_version (ob_weak_read_util.cpp:83) [13960][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=4][errcode=-4023] get gts cache error(ret=-4023, tenant_id=1)
[2024-05-04 18:05:02.543750] WDIAG [STORAGE.TRANS] generate_server_version (ob_tenant_weak_read_service.cpp:316) [13960][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=4][errcode=-4023] generate min weak read version error(ret=-4023, tenant_id=1)
[2024-05-04 18:05:02.543754] WDIAG [STORAGE.TRANS] generate_tenant_weak_read_timestamp_ (ob_tenant_weak_read_service.cpp:596) [13960][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=3][errcode=-4023] generate server version for tenant fail(ret=-4023, ret=“OB_EAGAIN”, tenant_id=1, index=0x2abd499bff10, server_version_epoch_tstamp_=1714817102543736)
[2024-05-04 18:05:02.543940] WDIAG [SHARE.LOCATION] batch_process_tasks (ob_ls_location_service.cpp:549) [13666][SysLocAsyncUp0][T0][YB42C0A8086F-0006179DBC8BBA0D-0-0] [lt=18][errcode=0] tenant schema is not ready, need wait(ret=0, ret=“OB_SUCCESS”, superior_tenant_id=1, task={cluster_id:1714844540, tenant_id:1, ls_id:{id:1}, renew_for_tenant:false, add_timestamp:1714817102543707})
[2024-05-04 18:05:02.557659] INFO [COMMON] compute_tenant_wash_size (ob_kvcache_store.cpp:1127) [13571][KVCacheWash][T0][Y0-0000000000000000-0-0] [lt=11] Wash compute wash size(is_wash_valid=false, sys_total_wash_size=-17786437632, global_cache_size=0, tenant_max_wash_size=0, tenant_min_wash_size=0, tenant_ids_=[cnt:4, 500, 508, 509, 1])
[2024-05-04 18:05:02.564596] WDIAG [SERVER] fill_ls_replica (ob_service.cpp:2742) [14022][T1_L0_G9][T1][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=27][errcode=-4719] get ls handle failed(ret=-4719, ret=“OB_LS_NOT_EXIST”)
[2024-05-04 18:05:02.565408] WDIAG [SHARE.PT] find_leader (ob_ls_info.cpp:847) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=2][errcode=-4018] fail to get leader replica(ret=-4018, ret=“OB_ENTRY_NOT_EXIST”, *this={tenant_id:1, ls_id:{id:1}, replicas:[]}, replica count=0)
[2024-05-04 18:05:02.565439] WDIAG [SHARE.PT] find_leader (ob_ls_info.cpp:847) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=30][errcode=-4018] fail to get leader replica(ret=-4018, ret=“OB_ENTRY_NOT_EXIST”, *this={tenant_id:1, ls_id:{id:1}, replicas:[]}, replica count=0)
[2024-05-04 18:05:02.565445] INFO [SHARE.PT] get_ls_info_ (ob_rpc_ls_table.cpp:140) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=6] leader doesn’t exist, try use all_server_list(tmp_ret=-4018, tmp_ret=“OB_ENTRY_NOT_EXIST”, ls_info={tenant_id:1, ls_id:{id:1}, replicas:[]})
[2024-05-04 18:05:02.565453] INFO [SHARE.PT] get_ls_info_ (ob_rpc_ls_table.cpp:151) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=6] server_list is empty, do nothing(ret=0, ret=“OB_SUCCESS”, server_list=[])
[2024-05-04 18:05:02.565460] WDIAG [SHARE] renew_master_rootserver (ob_rs_mgr.cpp:355) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=4][errcode=-4638] no leader finded(ret=-4638, ret=“OB_RS_NOT_MASTER”, leader_exist=false, ls_info={tenant_id:1, ls_id:{id:1}, replicas:[]})
[2024-05-04 18:05:02.565466] INFO [SHARE] renew_master_rootserver (ob_rs_mgr.cpp:366) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=6] [RS_MGR] new master rootserver found(rootservice=“0.0.0.0:0”, cluster_id=1714844540)
[2024-05-04 18:05:02.565471] WDIAG [SHARE] renew_master_rootserver (ob_rs_mgr.cpp:311) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=4][errcode=-4638] failed to renew master rootserver(ret=-4638, ret=“OB_RS_NOT_MASTER”)
[2024-05-04 18:05:02.565475] WDIAG [SERVER] register_self_busy_wait (ob_lease_state_mgr.cpp:169) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=3][errcode=-4638] renew_master_rootserver failed(tmp_ret=-4638)
[2024-05-04 18:05:02.565721] WDIAG [SERVER] fill_ls_replica (ob_service.cpp:2742) [14022][T1_L0_G9][T1][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=25][errcode=-4719] get ls handle failed(ret=-4719, ret=“OB_LS_NOT_EXIST”)
[2024-05-04 18:05:02.566201] WDIAG [SHARE.PT] find_leader (ob_ls_info.cpp:847) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=3][errcode=-4018] fail to get leader replica(ret=-4018, ret=“OB_ENTRY_NOT_EXIST”, *this={tenant_id:1, ls_id:{id:1}, replicas:[]}, replica count=0)
[2024-05-04 18:05:02.566221] WDIAG [SHARE.PT] find_leader (ob_ls_info.cpp:847) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=19][errcode=-4018] fail to get leader replica(ret=-4018, ret=“OB_ENTRY_NOT_EXIST”, *this={tenant_id:1, ls_id:{id:1}, replicas:[]}, replica count=0)
[2024-05-04 18:05:02.566225] INFO [SHARE.PT] get_ls_info_ (ob_rpc_ls_table.cpp:140) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=3] leader doesn’t exist, try use all_server_list(tmp_ret=-4018, tmp_ret=“OB_ENTRY_NOT_EXIST”, ls_info={tenant_id:1, ls_id:{id:1}, replicas:[]})
[2024-05-04 18:05:02.566229] INFO [SHARE.PT] get_ls_info_ (ob_rpc_ls_table.cpp:151) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=3] server_list is empty, do nothing(ret=0, ret=“OB_SUCCESS”, server_list=[])
[2024-05-04 18:05:02.566236] INFO [SHARE.LOCATION] batch_update_caches_ (ob_ls_location_service.cpp:944) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=2] [LS_LOCATION]ls location cache has changed(ret=0, ret=“OB_SUCCESS”, old_location={cache_key:{tenant_id:0, ls_id:{id:-1}, cluster_id:-1}, renew_time:0, replica_locations:[]}, new_location={cache_key:{tenant_id:1, ls_id:{id:1}, cluster_id:1714844540}, renew_time:1714817102566235, replica_locations:[]})
[2024-05-04 18:05:02.566244] WDIAG [SHARE.LOCATION] renew_location_ (ob_ls_location_service.cpp:1008) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=7][errcode=-4721] get empty location from meta table(ret=-4721, ret=“OB_LS_LOCATION_NOT_EXIST”, location={cache_key:{tenant_id:0, ls_id:{id:-1}, cluster_id:-1}, renew_time:0, replica_locations:[]})
[2024-05-04 18:05:02.566249] WDIAG [SHARE.LOCATION] get (ob_ls_location_service.cpp:289) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=5][errcode=-4721] renew location failed(ret=-4721, ret=“OB_LS_LOCATION_NOT_EXIST”, cluster_id=1714844540, tenant_id=1, ls_id={id:1})
[2024-05-04 18:05:02.566254] WDIAG [SHARE.LOCATION] get (ob_location_service.cpp:58) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=4][errcode=-4721] fail to get log stream location(ret=-4721, ret=“OB_LS_LOCATION_NOT_EXIST”, cluster_id=1714844540, tenant_id=1, ls_id={id:1}, expire_renew_time=9223372036854775807, is_cache_hit=false, location={cache_key:{tenant_id:0, ls_id:{id:-1}, cluster_id:-1}, renew_time:0, replica_locations:[]})
[2024-05-04 18:05:02.566260] WDIAG [SERVER] refresh_sys_tenant_ls (ob_service.cpp:2339) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=5][errcode=-4721] fail to refresh sys tenant log stream(ret=-4721, ret=“OB_LS_LOCATION_NOT_EXIST”, cluster_id=1714844540, tenant_id=1, SYS_LS={id:1})
[2024-05-04 18:05:02.566263] WDIAG [SERVER] register_self_busy_wait (ob_lease_state_mgr.cpp:171) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=2][errcode=-4638] fail to refresh core partition(tmp_ret=-4721)
[2024-05-04 18:05:02.566613] WDIAG [RPC] send (ob_poc_rpc_proxy.h:173) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=2][errcode=-4638] execute rpc fail(addr=“192.168.8.111:2882”, pcode=258, ret=-4638, timeout=2000000)
[2024-05-04 18:05:02.566634] WDIAG log_user_error_and_warn (ob_poc_rpc_proxy.cpp:244) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=20][errcode=-4638]
[2024-05-04 18:05:02.566747] WDIAG [SERVER] fill_ls_replica (ob_service.cpp:2742) [14022][T1_L0_G9][T1][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=17][errcode=-4719] get ls handle failed(ret=-4719, ret=“OB_LS_NOT_EXIST”)
[2024-05-04 18:05:02.567113] WDIAG [SHARE.PT] find_leader (ob_ls_info.cpp:847) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=4][errcode=-4018] fail to get leader replica(ret=-4018, ret=“OB_ENTRY_NOT_EXIST”, *this={tenant_id:1, ls_id:{id:1}, replicas:[]}, replica count=0)
[2024-05-04 18:05:02.567132] WDIAG [SHARE.PT] find_leader (ob_ls_info.cpp:847) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=19][errcode=-4018] fail to get leader replica(ret=-4018, ret=“OB_ENTRY_NOT_EXIST”, *this={tenant_id:1, ls_id:{id:1}, replicas:[]}, replica count=0)
[2024-05-04 18:05:02.567136] INFO [SHARE.PT] get_ls_info_ (ob_rpc_ls_table.cpp:140) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=4] leader doesn’t exist, try use all_server_list(tmp_ret=-4018, tmp_ret=“OB_ENTRY_NOT_EXIST”, ls_info={tenant_id:1, ls_id:{id:1}, replicas:[]})
[2024-05-04 18:05:02.567141] INFO [SHARE.PT] get_ls_info_ (ob_rpc_ls_table.cpp:151) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=4] server_list is empty, do nothing(ret=0, ret=“OB_SUCCESS”, server_list=[])
[2024-05-04 18:05:02.567144] WDIAG [SHARE] renew_master_rootserver (ob_rs_mgr.cpp:355) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=2][errcode=-4638] no leader finded(ret=-4638, ret=“OB_RS_NOT_MASTER”, leader_exist=false, ls_info={tenant_id:1, ls_id:{id:1}, replicas:[]})
[2024-05-04 18:05:02.567148] INFO [SHARE] renew_master_rootserver (ob_rs_mgr.cpp:366) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=3] [RS_MGR] new master rootserver found(rootservice=“0.0.0.0:0”, cluster_id=1714844540)
[2024-05-04 18:05:02.567151] WDIAG [SHARE] renew_master_rootserver (ob_rs_mgr.cpp:311) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=2][errcode=-4638] failed to renew master rootserver(ret=-4638, ret=“OB_RS_NOT_MASTER”)
[2024-05-04 18:05:02.567153] WDIAG [SHARE] rpc_call (ob_common_rpc_proxy.h:413) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=1][errcode=-4638] renew_master_rootserver failed(ret=-4638, retry=0)
[2024-05-04 18:05:02.567248] WDIAG [SERVER] fill_ls_replica (ob_service.cpp:2742) [14022][T1_L0_G9][T1][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=16][errcode=-4719] get ls handle failed(ret=-4719, ret=“OB_LS_NOT_EXIST”)
[2024-05-04 18:05:02.567610] WDIAG [SHARE.PT] find_leader (ob_ls_info.cpp:847) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=2][errcode=-4018] fail to get leader replica(ret=-4018, ret=“OB_ENTRY_NOT_EXIST”, *this={tenant_id:1, ls_id:{id:1}, replicas:[]}, replica count=0)
[2024-05-04 18:05:02.567629] WDIAG [SHARE.PT] find_leader (ob_ls_info.cpp:847) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=18][errcode=-4018] fail to get leader replica(ret=-4018, ret=“OB_ENTRY_NOT_EXIST”, *this={tenant_id:1, ls_id:{id:1}, replicas:[]}, replica count=0)
[2024-05-04 18:05:02.567633] INFO [SHARE.PT] get_ls_info_ (ob_rpc_ls_table.cpp:140) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=4] leader doesn’t exist, try use all_server_list(tmp_ret=-4018, tmp_ret=“OB_ENTRY_NOT_EXIST”, ls_info={tenant_id:1, ls_id:{id:1}, replicas:[]})
[2024-05-04 18:05:02.567637] INFO [SHARE.PT] get_ls_info_ (ob_rpc_ls_table.cpp:151) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=3] server_list is empty, do nothing(ret=0, ret=“OB_SUCCESS”, server_list=[])
[2024-05-04 18:05:02.567640] WDIAG [SHARE] renew_master_rootserver (ob_rs_mgr.cpp:355) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=2][errcode=-4638] no leader finded(ret=-4638, ret=“OB_RS_NOT_MASTER”, leader_exist=false, ls_info={tenant_id:1, ls_id:{id:1}, replicas:[]})
[2024-05-04 18:05:02.567643] INFO [SHARE] renew_master_rootserver (ob_rs_mgr.cpp:366) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=3] [RS_MGR] new master rootserver found(rootservice=“0.0.0.0:0”, cluster_id=1714844540)
[2024-05-04 18:05:02.567646] WDIAG [SHARE] renew_master_rootserver (ob_rs_mgr.cpp:311) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=3][errcode=-4638] failed to renew master rootserver(ret=-4638, ret=“OB_RS_NOT_MASTER”)
[2024-05-04 18:05:02.567648] WDIAG [SHARE] rpc_call (ob_common_rpc_proxy.h:413) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=2][errcode=-4638] renew_master_rootserver failed(ret=-4638, retry=1)
[2024-05-04 18:05:02.567740] WDIAG [SERVER] fill_ls_replica (ob_service.cpp:2742) [14022][T1_L0_G9][T1][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=14][errcode=-4719] get ls handle failed(ret=-4719, ret=“OB_LS_NOT_EXIST”)
[2024-05-04 18:05:02.568159] WDIAG [SHARE.PT] find_leader (ob_ls_info.cpp:847) [13535][observer][T0][YB42C0A8086F-0006179DBD3BA223-0-0] [lt=2][errcode=-4018] fail to get leader replica(ret=-4018, ret=“OB_ENTRY_NOT_EXIST”, *this={tenant_id:1, ls_id:{id:1}, replicas:[]}, replica count=0)

sysctl.conf 配置和官网的一样,如下

fs.aio-max-nr=1048576
net.core.somaxconn = 2048
net.core.netdev_max_backlog = 10000
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

net.ipv4.ip_local_port_range = 3500 65535
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_slow_start_after_idle=0
vm.swappiness = 0
vm.min_free_kbytes = 2097152
fs.file-max = 6573688
vm.max_map_count = 655360
kernel.core_pattern = /data/core-%e-%p-%t

可以提供下完整的obd日志和配置文件看下。

请问使用obd web这种方式安装还需要修改配置文件吗,没有进行修改过下面这些文件
image

我用夸克网盘分享了「observer.log」,点击链接即可保存。打开「夸克APP」在线查看,支持多种文档格式转换。
链接:夸克网盘分享

obd cluster edit-config name(集群名称)

找到问题的原因了,OCP节点防火墙未关闭,导致部署失败后observer.log 日志一直刷新不停止