OCP高可用布署失败

【 使用环境 】测试环境
【 OB or 其他组件 】OCP
【 使用版本 】ocp-all-in-one-4.2.2
【问题描述】单节点OCP添加主机失败
目前已建立一个单节点的OCP群组,群组内只有单一zone单一主机,为了避免单节点OCP故障,故参照云平台高可用布署说明的方法添加主机後,再将主机添加至集群,但出现集群XXX不允许进行该操作的错误,请问该如何处理?

1.云平台高可用布署说明 (OceanBase分布式数据库-海量数据 笔笔算数)

2.目前只有单一OCP集群单一主机


3.已添加新主机成功

4.集群新增OBServer

5.显示集群XXX不允许进行该操作的错误

不允许操作的原因是这个集群是 OCP 的 metadb 集群,如果要对这个集群进行扩容,建议使用部署时的工具,用 obd 进行扩容,可以参考这篇文档,OceanBase分布式数据库-海量数据 笔笔算数 或者可以考虑部署 3 副本的 OCP 再接管当前 OCP 中管理的集群

可以试下


这里创建个三副本集群。

已重新创建3副本的OCP集群,每个节点都可登入OCP管理界面,不过在测试时发现,如果主节点故障了,则所有OCP节点都无法正常运行,这样无法达到高可用,请问该如何处理?


停止副节点,其他2个OCP节点的管理界面可以正常运行


但停止主节点,所有OCP管理界面都无法正常显示

可以试着把ocp单独部署为一台机器。然后再通过ocp来接管这个1-1-1(3副本)集群,或者通过ocp来创建这个1-1-1集群。
也可以参考下这个帖子 OCP

此种方法先前已经验证过没有问题,但目前想要解决的是OCP的高可用性。因为如果只有单一OCP机器,万一此机器故障,则无法管理上面的业务集群。所以才想布署多节点的OCP集群,让OCP也具备高可用性,但目前卡在集群中的主节点故障,则此集群就无法运行。

ocp也可以部署多副本集群。

如果目前主节点故障了,可以提供下observer.log日志

我先将副节点的日志清空,然後停止主节点(192.168.5.205),查看副节点的管理页面,显示 HTTP Status 500 – Internal Server Error。OCP log没有产生 (/home/root/logs/),只有/root/oceanbase/log/election.log持续在写日志

[2024-05-29 16:30:45.167388] INFO [ELECT] operator() (election_proposer.cpp:241) [3907][T1004_Occam][T1004][Y0-0000000000000000-0-0] [lt=69] dump proposer info(*this={ls_id:{id:1}, addr:“192.168.5.253:2882”, role:Follower, ballot_number:2, lease_interval:-0.00s, memberlist_with_states:{member_list:{addr_list:[“192.168.5.205:2882”, “192.168.5.252:2882”, “192.168.5.253:2882”], membership_version:{proposal_id:10, config_seq:11}, replica_num:3}, prepare_ok:False, accept_ok_promised_ts:invalid, follower_promise_membership_version:{proposal_id:9223372036854775807, config_seq:-1}}, priority_seed:0x1000, restart_counter:1, last_do_prepare_ts:2024-05-29 16:21:02.2812, self_priority:{priority:{is_valid:true, is_observer_stopped:false, is_server_stopped:false, is_zone_stopped:false, fatal_failures:[], is_primary_region:true, serious_failures:[], is_in_blacklist:false, in_blacklist_reason:, scn:{val:1716971444729641472, v:0}, is_manual_leader:false, zone_priority:1}}, p_election:0x7ff89fdf9230})
[2024-05-29 16:30:45.197575] INFO [ELECT] operator() (election_proposer.cpp:241) [3907][T1004_Occam][T1004][Y0-0000000000000000-0-0] [lt=74] dump proposer info(*this={ls_id:{id:1003}, addr:“192.168.5.253:2882”, role:Leader, ballot_number:3, prepare_success_ballot:3, lease_interval:4.00s, memberlist_with_states:{member_list:{addr_list:[“192.168.5.205:2882”, “192.168.5.252:2882”, “192.168.5.253:2882”], membership_version:{proposal_id:14, config_seq:18}, replica_num:3}, prepare_ok:[false, true, true], accept_ok_promised_ts:[16:21:03.194, 16:30:48.692, 16:30:48.692]follower_promise_membership_version:{proposal_id:14, config_seq:18}}, lease_and_epoch:{leader_lease:{span_from_now:3.496s, expired_time_point:16:30:48.693}, epoch:3}, priority_seed:0x1000, restart_counter:1, last_do_prepare_ts:2024-05-29 13:12:46.501809, self_priority:{priority:{is_valid:true, is_observer_stopped:false, is_server_stopped:false, is_zone_stopped:false, fatal_failures:[], is_primary_region:true, serious_failures:[], is_in_blacklist:false, in_blacklist_reason:, scn:{val:1716971444297306999, v:0}, is_manual_leader:false, zone_priority:0}}, p_election:0x7ff89aa0d230})
[2024-05-29 16:30:45.308278] INFO [ELECT] operator() (election_proposer.cpp:252) [3907][T1004_Occam][T1004][Y0-0000000000000000-0-0] [lt=18] dump message count(ls_id="{id:1002}", self_addr=“192.168.5.253:2882”, state=| match:[Prepare Request, Prepare Response, Accept Request, Accept Response, Change Leader]| “192.168.5.205:2882”:send:[1, 1, 0, 241, 0], rec:[1, 0, 241, 0, 0], last_send:2024-05-29 12:02:29.127697, last_rec:2024-05-29 12:02:29.127697| “192.168.5.252:2882”:send:[1, 0, 0, (32195), 0], rec:[1, 0, (32195), 0, 0], last_send:2024-05-29 16:30:45.231949, last_rec:2024-05-29 16:30:45.231949| “192.168.5.253:2882”:send:[1, 0, 0, 0, 0], rec:[1, 0, 0, 0, 0], last_send:2024-05-29 12:00:27.800338, last_rec:2024-05-29 12:00:27.800338)
[2024-05-29 16:30:45.499232] INFO [ELECT] operator() (election_acceptor.cpp:137) [3501][T1002_Occam][T1002][Y0-0000000000000000-0-0] [lt=35] dump acceptor info(*this={ls_id:{id:1001}, addr:“192.168.5.253:2882”, ballot_number:2, ballot_of_time_window:2, lease:{owner:“192.168.5.252:2882”, lease_end_ts:{span_from_now:3.817s, expired_time_point:16:30:49.316}, ballot_number:2}, is_time_window_opened:False, vote_reason:IP-PORT(priority equal), last_time_window_open_ts:2024-05-29 16:21:02.486613, highest_priority_prepare_req:{this:0x7ff8da51fab0, BASE:{msg_type:“Prepare Request”, id:1001, sender:“192.168.5.252:2882”, receiver:“192.168.5.253:2882”, restart_counter:1, ballot_number:2, debug_ts:{src_construct_ts:“21:02.490497”, src_serialize_ts:“21:02.490535”, dest_deserialize_ts:“21:02.487041”, dest_process_ts:“21:02.487049”, process_delay:-3448}, biggest_min_cluster_version_ever_seen:4.3.0.1}, role:“Follower”, is_buffer_valid:true, inner_priority_seed:4096, membership_version:{proposal_id:9, config_seq:10}}, p_election:0x7ff8da51f230})
[2024-05-29 16:30:45.499291] INFO [ELECT] operator() (election_acceptor.cpp:137) [3501][T1002_Occam][T1002][Y0-0000000000000000-0-0] [lt=55] dump acceptor info(*this={ls_id:{id:1003}, addr:“192.168.5.253:2882”, ballot_number:3, ballot_of_time_window:3, lease:{owner:“192.168.5.253:2882”, lease_end_ts:{span_from_now:3.889s, expired_time_point:16:30:49.388}, ballot_number:3}, is_time_window_opened:False, vote_reason:IP-PORT(priority equal), last_time_window_open_ts:2024-05-29 12:00:19.486030, highest_priority_prepare_req:{this:0x7ff8da5b3ab0, BASE:{msg_type:“Prepare Request”, id:1003, sender:“192.168.5.205:2882”, receiver:“192.168.5.253:2882”, restart_counter:1, ballot_number:0, debug_ts:{src_construct_ts:“00:19.483126”, src_serialize_ts:“00:19.483174”, dest_deserialize_ts:“00:19.486492”, dest_process_ts:“00:19.486495”, process_delay:3369}, biggest_min_cluster_version_ever_seen:4.3.0.1}, role:“Follower”, is_buffer_valid:true, inner_priority_seed:4096, membership_version:{proposal_id:10, config_seq:10}}, p_election:0x7ff8da5b3230})

1.1)ocp异常

1.1.1)通过ss -lnutp |grep obproxy查看进程,如果发现没有启动,再去查看obproxy日志(路径:/tmp/ocp/log/obproxy.log)。shift G 拉到最下面去,查看情况。可以看到我是没有以ip和端口为2883,连接不上,通过验证mysql -hip -uroot@sys -P2883 -pxxxx -A,连接失败

1.1.2)可以再次查看 ss -lnutp|grep obproxy 确认Obproxy没启动

1.1.3)通过配置文件确认obproxy路径进行启动。

1.1.4)使用./bin/obproxy

2)数据库异常

2.1)通过ps -ef |grep observer确认进程是否启动

2.2)通过直连ip为连接数据库:mysql -hip -uroot@sys -P2881 -pxxx -A

2.3)进入到数据库use 数据库中,查看状态:mysql> mysql> select * from __all_server;

再次验证停止OCP主节点,详细内容如下,请协助确认一下是否可以解决主节点故障,其他副节点可以使用OCP管理界面,谢谢。

  1. OCP集群共3个节点,其中192.168.5.252是主节点。

  2. 查看副节点192.168.5.253的obproxy,是正常运行。

  3. 模拟主节点故障,停止192.168.5.252服务器,副节点的OCP管理界面呈现500的错误。

  4. 在副节点192.168.5.253再次确认obproxy,为正常运行。在副节点连线至本机的数据库,可正常连线。

  1. 查看副节点的observer,是正常运行的。

  2. 在副节点查看observer.log,没有异常。

[2024-06-04 11:00:25.053664] INFO [PROXY.SS] do_io_close (ob_mysql_client_session.cpp:792) [76425][Y0-00007FDC0F3358C0] [lt=0] [dc=0] client session do_io_close((*this={this:0x7fdc0ec942d0, is_proxy_mysql_client:true, is_waiting_trans_first_request:false, need_delete_cluster:false, is_first_dml_sql_got:true, vc_ready_killed:false, active:true, magic:19132429, conn_decrease:true, current_tid:76425, cs_id:357040633, proxy_sessid:0, session_info:{is_inited:true, priv_info:{has_all_privilege:false, cs_id:357040633, user_priv_set:-1, cluster_name:“Ren_Test”, tenant_name:“sys”, user_name:“detect_user”}, version:{common_hot_sys_var_version:0, common_sys_var_version:0, mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:0, sys_var_version:0, user_var_version:0, db_name_version:0, last_insert_id_version:0, sess_info_version:0}, hash_version:{common_hot_sys_var_version:0, common_sys_var_version:0, mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:0, sys_var_version:0, user_var_version:0, db_name_version:0, last_insert_id_version:0, sess_info_version:0}, val_hash:{common_hot_sys_var_hash:0, common_cold_sys_var_hash:0, mysql_hot_sys_var_hash:0, mysql_cold_sys_var_hash:0, hot_sys_var_hash:0, cold_sys_var_hash:0, user_var_hash:0}, global_vars_version:-1, is_global_vars_changed:false, is_trans_specified:false, is_user_idc_name_set:false, is_read_consistency_set:false, idc_name:"", cluster_id:0, real_meta_cluster_name:"", safe_read_snapshot:0, syncing_safe_read_snapshot:0, route_policy:1, proxy_route_policy:3, user_identity:0, global_vars_version:-1, is_read_only_user:false, is_request_follower_user:false, obproxy_force_parallel_query_dop:1, ob20_request:{remain_payload_len:0, ob20_request_received_done:false, ob20_header:{ob 20 protocol header:{compressed_len:0, seq:0, non_compressed_len:0}, magic_num:0, header_checksum:0, connection_id:0, request_id:0, pkt_seq:0, payload_len:0, version:0, flag_.flags:0, reserved:0}}, client_cap:0, server_cap:0, last_server_addr:{Not IP address [0]:0}, last_server_sess_id:0, init_sql:""}, dummy_ldc:{use_ldc:false, idc_name:"", item_count:0, site_start_index_array:[[0]0, [1]0, [2]0, [3]0], item_array:null, pl:null, ts:null, readonly_exist_status:“READONLY_ZONE_UNKNOWN”}, dummy_entry:null, server_state_version:0, cur_ss:null, bound_ss:null, lii_ss:null, cluster_resource:{this:0x7fdc06c34080, ref_count:10, is_inited:true, cluster_info_key:{cluster_name:{config_string:“Ren_Test”}, cluster_id:0}, cr_state:“CR_AVAIL”, version:2, last_access_time_ns:1717470017118370417, deleting_completed_thread_num:0, fetch_rslist_task_count:0, fetch_idc_list_task_count:0, last_idc_list_refresh_time_ns:1717469273477099196, last_rslist_refresh_time_ns:1717469935366000322, server_state_version:2}, client_vc:0x7fdc07a91f10, using_ldg:false, trace_stats:NULL}, client_vc_=0x7fdc07a91f10, this=0x7fdc0ec942d0)
[2024-06-04 11:00:25.053690] WDIAG [PROXY] handle_client_resp (ob_server_state_processor.cpp:2240) [76425][Y0-00007FDC0F3358C0] [lt=0] [dc=0] detect server dead(info={addr:“192.168.5.252:2881”, zone_name:“zone1”, region_name:“sys_region”, idc_name:“default_idc”, zone_type:“ReadWrite”, is_merging:false, is_force_congested:false, request_sql_cnt:0, last_response_time:0, detect_fail_cnt:3})
[2024-06-04 11:00:25.053697] INFO [PROXY] kill_this (ob_client_vc.cpp:1206) [76425][Y0-00007FDC0F3358C0] [lt=0] [dc=0] mysql client will kill self(this=0x7fdc07667ba0)
[2024-06-04 11:00:25.053709] INFO [PROXY.CS] destroy (ob_mysql_client_session.cpp:96) [76425][Y0-00007FDC0F3358C0] [lt=0] [dc=0] client session destroy(cs_id=357040633, proxy_sessid=0, client_vc=NULL)
[2024-06-04 11:00:25.053714] INFO [PROXY.SM] kill_this (ob_mysql_sm.cpp:10073) [76425][Y0-00007FDC0F3358C0] [lt=0] [dc=0] deallocating sm(sm_id=505)
[2024-06-04 11:00:25.623409] INFO [PROXY.CS] new_connection (ob_mysql_client_session.cpp:374) [76425][Y0-00007FDC0F3358C0] [lt=0] [dc=0] client session born(cs_id=357040634, proxy_sessid=0, is_local_connection=true, client_vc=0x7fdc07a91f10, client_fd=0, client_addr=, is_proxy_client=true)
[2024-06-04 11:00:25.623430] INFO [PROXY.CS] new_transaction (ob_mysql_client_session.cpp:244) [76425][Y0-00007FDC0F3358C0] [lt=0] [dc=0] Starting new transaction using sm(cs_id=357040634, get_transact_count()=0, sm_id=506)

7.在副节点查看ocp-server.log,显示错误。部份内容如下:
ocp-server.log (9.6 MB)

2024-06-04 10:51:42.952 INFO 77632 — [pool-7-thread-1,] c.o.o.s.i.r.s.RateLimitFacadeServiceImpl : After refresh, the leader address is :null
2024-06-04 10:51:42.958 WARN 77632 — [ocp-updater-0,] com.alibaba.druid.pool.DruidDataSource : get connection timeout retry : 1
2024-06-04 10:51:43.023 WARN 77632 — [pool-subtask-coordinator3,] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: null
2024-06-04 10:51:43.023 ERROR 77632 — [pool-subtask-coordinator3,] o.h.engine.jdbc.spi.SqlExceptionHelper : wait millis 2000, active 0, maxActive 200, creating 0, createErrorCount 105
2024-06-04 10:51:43.023 INFO 77632 — [pool-subtask-coordinator3,] c.o.o.c.t.e.c.SubtaskCoordinator : Worker schedule failed, name=failed_subtask_worker
2024-06-04 10:51:43.030 INFO 77632 — [pool-ocp-async-5,87988086da1f45f8,1c72b27da955] c.o.o.s.common.DistributedLockAspect : Get distributed lock, lockKey=AlarmEventProcessForSelfcure, method=observerAlarmEvents
2024-06-04 10:51:43.030 WARN 77632 — [pool-ocp-async-7,dfb50f48861643d7,6265eb4a973b] com.alibaba.druid.pool.DruidDataSource : get connection timeout retry : 1
2024-06-04 10:51:43.030 WARN 77632 — [pool-ocp-async-1,6c6f4da35dd147be,ba0345d6b56e] com.alibaba.druid.pool.DruidDataSource : get connection timeout retry : 1
2024-06-04 10:51:43.030 WARN 77632 — [pool-ocp-async-10,e7a685a2d0f545e0,9ef4409fc169] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: null
2024-06-04 10:51:43.030 ERROR 77632 — [pool-ocp-async-10,e7a685a2d0f545e0,9ef4409fc169] o.h.engine.jdbc.spi.SqlExceptionHelper : wait millis 2000, active 0, maxActive 200, creating 0, createErrorCount 105
2024-06-04 10:51:43.030 WARN 77632 — [pool-ocp-async-9,590661a08e4d449b,224ca172577d] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: null
2024-06-04 10:51:43.030 ERROR 77632 — [pool-ocp-async-9,590661a08e4d449b,224ca172577d] o.h.engine.jdbc.spi.SqlExceptionHelper : wait millis 2000, active 0, maxActive 200, creating 0, createErrorCount 105
2024-06-04 10:51:43.032 WARN 77632 — [pool-ocp-async-3,79a544ffb9384187,6fad1388381a] com.alibaba.druid.pool.DruidDataSource : get connection timeout retry : 1
2024-06-04 10:51:43.040 WARN 77632 — [pool-ocp-async-5,87988086da1f45f8,1c72b27da955] c.o.o.s.common.DistributedLockAspect : Acquire lock fail, another request is processing, key=AlarmEventProcessForSelfcure
2024-06-04 10:51:43.174 WARN 77632 — [pool-subtask-coordinator2,] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: null
2024-06-04 10:51:43.174 ERROR 77632 — [pool-subtask-coordinator2,] o.h.engine.jdbc.spi.SqlExceptionHelper : wait millis 2000, active 0, maxActive 200, creating 0, createErrorCount 105
2024-06-04 10:51:43.174 INFO 77632 — [pool-subtask-coordinator2,] c.o.o.c.t.e.c.SubtaskCoordinator : Worker schedule failed, name=ready_subtask_worker
2024-06-04 10:51:43.324 WARN 77632 — [pool-task-coordinator1,7aa2482b21074de8,2ccb41937d9a] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: null
2024-06-04 10:51:43.324 ERROR 77632 — [pool-task-coordinator1,7aa2482b21074de8,2ccb41937d9a] o.h.engine.jdbc.spi.SqlExceptionHelper : wait millis 2000, active 0, maxActive 200, creating 1, createElapseMillis 11, createErrorCount 105
2024-06-04 10:51:43.325 WARN 77632 — [pool-task-coordinator1,7aa2482b21074de8,2ccb41937d9a] c.o.o.c.t.e.coordinator.TaskCoordinator : Schedule failed.

com.oceanbase.ocp.core.distributed.lock.LockException: Failed to lock mutex at running_task_worker
at com.oceanbase.ocp.core.distributed.lock.jdbc.JdbcLockDelegateRegistry$JdbcDelegateLock.rethrowAsLockException(JdbcLockDelegateRegistry.java:213)
at com.oceanbase.ocp.core.distributed.lock.jdbc.JdbcLockDelegateRegistry$JdbcDelegateLock.tryLock(JdbcLockDelegateRegistry.java:272)
at com.oceanbase.ocp.core.distributed.lock.jdbc.JdbcLockDelegateRegistry$JdbcDelegateLock.tryLock(JdbcLockDelegateRegistry.java:244)
at com.oceanbase.ocp.core.task.engine.coordinator.TaskCoordinator.lockAndExecute(TaskCoordinator.java:96)
at com.oceanbase.ocp.core.task.engine.coordinator.TaskCoordinator.lambda$startup$1(TaskCoordinator.java:68)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction; nested exception is org.hibernate.exception.GenericJDBCException: Unable to acquire JDBC Connection
at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:467)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.startTransaction(AbstractPlatformTransactionManager.java:400)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:373)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:137)
at org.springframework.transaction.support.TransactionOperations.executeWithoutResult(TransactionOperations.java:67)
at com.oceanbase.ocp.core.distributed.lock.jdbc.JdbcLockRepository.deleteExpired(JdbcLockRepository.java:117)
at com.oceanbase.ocp.core.distributed.lock.jdbc.JdbcLockRepository.acquire(JdbcLockRepository.java:89)
at com.oceanbase.ocp.core.distributed.lock.jdbc.JdbcLockDelegateRegistry$JdbcDelegateLock.doLock(JdbcLockDelegateRegistry.java:278)
at com.oceanbase.ocp.core.distributed.lock.jdbc.JdbcLockDelegateRegistry$JdbcDelegateLock.tryLock(JdbcLockDelegateRegistry.java:261)
… 10 common frames omitted
Caused by: org.hibernate.exception.GenericJDBCException: Unable to acquire JDBC Connection
at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:111)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getConnectionForTransactionManagement(LogicalConnectionManagedImpl.java:276)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.begin(LogicalConnectionManagedImpl.java:284)
at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.begin(JdbcResourceLocalTransactionCoordinatorImpl.java:246)
at org.hibernate.engine.transaction.internal.TransactionImpl.begin(TransactionImpl.java:83)
at org.springframework.orm.jpa.vendor.HibernateJpaDialect.beginTransaction(HibernateJpaDialect.java:164)
at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:421)
… 18 common frames omitted
Caused by: com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 2000, active 0, maxActive 200, creating 1, createElapseMillis 11, createErrorCount 105
at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1773)
at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1427)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1407)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1397)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:100)
at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122)
at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108)
… 25 common frames omitted
Caused by: java.sql.SQLNonTransientConnectionException: Could not connect to HostAddress{host=‘192.168.5.252’, port=2883}. No route to host (Host unreachable)
at com.oceanbase.jdbc.internal.util.exceptions.ExceptionFactory.createException(ExceptionFactory.java:122)
at com.oceanbase.jdbc.internal.util.exceptions.ExceptionFactory.create(ExceptionFactory.java:225)
at com.oceanbase.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1659)
at com.oceanbase.jdbc.internal.util.Utils.retrieveProxy(Utils.java:1427)
at com.oceanbase.jdbc.OceanBaseConnection.newConnection(OceanBaseConnection.java:306)
at com.oceanbase.jdbc.Driver.connect(Driver.java:89)
at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1657)
at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1723)
at com.alibaba.druid.pool.DruidDataSource$CreateConnectionThread.run(DruidDataSource.java:2838)
Caused by: java.net.NoRouteToHostException: No route to host (Host unreachable)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.oceanbase.jdbc.internal.protocol.AbstractConnectProtocol.createSocket(AbstractConnectProtocol.java:286)
at com.oceanbase.jdbc.internal.protocol.AbstractConnectProtocol.createConnection(AbstractConnectProtocol.java:552)
at com.oceanbase.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1639)
… 6 common frames omitted

这个 OCP server 是通过哪个 obproxy 来连接 metadb 的,看日志就是连不上,可以确认一下具体是通过哪个 proxy 来连接的,是否能正常连上

我在安装时OCP节点是3个(252,253,254),安装时的obproxy也是这3个节点,我以为是各自连各自节点的metadb。但测试停止主节点服务器(252),尝试用副节点(253)的IP位置打开OCP管理页面,就出现500的错误,感觉像是没有高可用(HA),副节点没有侦测主节点故障自动转主节点。我附上的日志都是副节点(253)的,主节点当时已经停掉,所以没有日志。

下图为安装时的OCP配置



大佬,最后找到原因了吗,我们也碰到类似的情况了