OceanBase通过obd无法启动

【 使用环境 】生产环境
【 OB or 其他组件 】

【 使用版本 】
【问题描述】
[root@5grim-kubernetes-4-1 log]# obd cluster reload d_dc2
Get local repositories and plugins ok
Load cluster param plugin ok
Open ssh connection ok
Cluster status check ok
Search plugins ok
Load cluster param plugin ok
Check before start observer ok
Check before start obproxy ok
Check before start obagent ok
Check before start ocp-express ok
Start observer ok
observer program health check ok
obshell program health check ok
Connect to observer 10.106.48.20:2881 ok
Start obproxy ok
obproxy program health check ok
Connect to obproxy ok
Start obagent ok
obagent program health check ok
Connect to Obagent ok
Start ocp-express \

OceanBase启动一直卡在start ocp 的位置

当前环境一共三个节点,observer、obproxy 进程都在。
ansible -i ob_host 20 -m shell -a ‘ps -ef | grep observer’
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
[WARNING]: Platform linux on host 20-3 is using the discovered Python interpreter at /usr/bin/python3.7, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-
core/2.15/reference_appendices/interpreter_discovery.html for more information.
20-3 | CHANGED | rc=0 >>
root 14515 14511 0 14:40 pts/0 00:00:00 /bin/sh -c ps -ef | grep observer
root 14517 14515 0 14:40 pts/0 00:00:00 grep observer
root 42251 1 47 8月06 ? 10:55:53 /opt/data/oceanbase/d_dc2/oceanbase/bin/observer -p 2881
[WARNING]: Platform linux on host 20-2 is using the discovered Python interpreter at /usr/bin/python3.7, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-
core/2.15/reference_appendices/interpreter_discovery.html for more information.
20-2 | CHANGED | rc=0 >>
root 33218 33214 0 14:40 pts/0 00:00:00 /bin/sh -c ps -ef | grep observer
root 33220 33218 0 14:40 pts/0 00:00:00 grep observer
root 52683 1 99 8月06 ? 3-05:53:25 /opt/data/oceanbase/d_dc2/oceanbase/bin/observer -p 2881
[WARNING]: Platform linux on host 20-1 is using the discovered Python interpreter at /usr/bin/python3.7, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-
core/2.15/reference_appendices/interpreter_discovery.html for more information.
20-1 | CHANGED | rc=0 >>
root 29311 1 99 8月06 ? 1-04:09:21 /opt/data/oceanbase/d_dc2/oceanbase/bin/observer -p 2881
root 53641 53640 0 14:40 pts/2 00:00:00 /bin/sh -c ps -ef | grep observer
root 53643 53641 0 14:40 pts/2 00:00:00 grep observer

20-1 | CHANGED | rc=0 >>
root 34016 1 0 8月06 ? 00:00:25 bash /opt/data/oceanbase/d_dc2/obproxy/obproxyd.sh /opt/data/oceanbase/d_dc2/obproxy 10.106.48.20 2883 daemon
root 34051 1 5 8月06 ? 01:18:27 /opt/data/oceanbase/d_dc2/obproxy/bin/obproxy --listen_port 2883
root 55283 55282 0 14:40 pts/2 00:00:00 /bin/sh -c ps -ef | grep obproxy
root 55285 55283 0 14:40 pts/2 00:00:00 grep obproxy

oceanbase/log/election.log日志中只能看到一个节点在,

[2025-08-07 14:38:00.164801] WDIAG [ELECT] refresh_priority_ (election_impl.cpp:332) [32491][T1002_L0_G2][T1002][YB420A6A3014-00063BAD9895A8D3-0-0] [lt=14][errcode=-4018] refresh priority failed(ret=-4018, ret=“OB_ENTRY_NOT_EXIST”, *this={is_inited:true, is_running:true, proposer:{ls_id:{id:1003}, addr:“10.106.48.20:2882”, role:Leader, ballot_number:1, prepare_success_ballot:1, lease_interval:4.00s, memberlist_with_states:{member_list:{addr_list:[“10.106.48.20:2882”], membership_version:{proposal_id:7, config_seq:8}, replica_num:1}, prepare_ok:True, accept_ok_promised_ts:14:38:03.661, follower_promise_membership_version:{proposal_id:7, config_seq:8}}, lease_and_epoch:{leader_lease:{span_from_now:3.497s, expired_time_point:14:38:03.661}, epoch:1}, priority_seed:0x1000, restart_counter:1, last_do_prepare_ts:2025-08-06 15:48:31.666948, self_priority:{priority:{is_valid:false, is_observer_stopped:false, is_server_stopped:false, is_zone_stopped:false, fatal_failures:[], is_primary_region:false, serious_failures:[{type:SCHEMA NOT REFRESHED, module:SCHEMA, info:schema not refreshed, level:SERIOUS}], is_in_blacklist:false, in_blacklist_reason:, scn:{val:0, v:0}, is_manual_leader:false, zone_priority:9223372036854775807}}, p_election:0x7f08cfdbf1b0}, acceptor:{ls_id:{id:1003}, addr:“10.106.48.20:2882”, ballot_number:1, ballot_of_time_window:1, lease:{owner:“10.106.48.20:2882”, lease_end_ts:{span_from_now:4.000s, expired_time_point:14:38:04.164}, ballot_number:1}, is_time_window_opened:False, vote_reason:the only request, last_time_window_open_ts:2025-08-06 15:48:31.667549, highest_priority_prepare_req:{this:0x7f08cfdbfa30, BASE:{msg_type:“Prepare Request”, id:1003, sender:“10.106.48.20:2882”, receiver:“10.106.48.20:2882”, restart_counter:1, ballot_number:1, debug_ts:{src_construct_ts:“48:31.667019”, src_serialize_ts:“48:31.667104”, dest_deserialize_ts:“48:31.667495”, dest_process_ts:“48:31.667513”, process_delay:494}, biggest_min_cluster_version_ever_seen:4.2.2.0}, role:“Follower”, is_buffer_valid:true, inner_priority_seed:4096, membership_version:{proposal_id:6, config_seq:7}}, p_election:0x7f08cfdbf1b0}, ls_biggest_min_cluster_version_ever_seen:4.2.2.0, priority:{priority:{is_valid:false, is_observer_stopped:false, is_server_stopped:false, is_zone_stopped:false, fatal_failures:[], is_primary_region:false, serious_failures:[{type:SCHEMA NOT REFRESHED, module:SCHEMA, info:schema not refreshed, level:SERIOUS}], is_in_blacklist:false, in_blacklist_reason:, scn:{val:0, v:0}, is_manual_leader:false, zone_priority:9223372036854775807}}})

【备注】基于 LLM 和开源文档 RAG 的论坛小助手已开放测试,在发帖时输入 [@论坛小助手] 即可召唤小助手,欢迎试用!

1 个赞

ocp-express当前该组件已经不进行维护了。如果想体验建议部署ocp产品
可以使用obd cluster start xxxx -c oceanbase-ce单独启动该集群ob

已经部署了ocp、obproxy、observer ,启动失败日志:

日志中有提示:errcode=-4076,errcode=-5019,errcode=-4012,
当前这个环境启动一直卡在ocp,oceanbase/log/election.log中提示仲裁选举不成功,看日志里面只有一个节点。

election.log.txt (5 MB)
observer.log.txt (5 MB)

你这个是ocp-express组件(一款简易版的ocp当前已经不维护了),不是ocp。
重新搭建个ocp集群或者使用obd cluster start xxxx -c oceanbase-ce单独启动该集群的ob数据库

可以通过obd单独卸载ocp吗?

1 个赞

当前集群启动状态还是异常,我应该如何处理?
需要将observer、obagent、obproxy启动,当前通过客户端连接obproxy无法连接数据库。

1 个赞

你是如何启动ob的?单独启动ob也失败了么 如果失败需要提供一下observer.log日志看一下
同样可以使用obd cluster start xxxx -c obproxy-ce单独启动ODP。obagent可以不启动这个是和ocp-express绑定的

1 个赞

obd cluster strart ****

但当前因为ocp的问题导致整个集群都无法使用,observer、obagent、obproxy进程虽然启动,但是无法连接,当前几套集群都出现了类似一样的问题。

1 个赞

observer的日志在上面,取了后面一部分的日志,但看到日志中仅有一些集群一致性的问题,meta data元数据问题。集群选举失败,看几台oceanbase/etc/observer.config.bin,配置文件都是一致的。

1 个赞

ps -ef 看一下这三个进程observer、obagent、obproxy

1 个赞

学习

2 个赞

蹲一个后续

1 个赞

[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
[WARNING]: Platform linux on host 20-3 is using the discovered Python interpreter at /usr/bin/python3.7, but future installation of
another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-
core/2.15/reference_appendices/interpreter_discovery.html for more information.
20-3 | CHANGED | rc=0 >>
root 6402 6401 0 16:02 pts/0 00:00:00 /bin/sh -c ps -ef | grep -E “observer|obagent|obproxy”
root 6404 6402 0 16:02 pts/0 00:00:00 grep -E observer|obagent|obproxy
root 42251 1 46 8月06 ? 22:28:17 /opt/data/oceanbase/d_dc2/oceanbase/bin/observer -p 2881
root 45507 1 0 8月06 ? 00:00:10 /opt/data/oceanbase/d_dc2/obagent/bin/ob_agentd -c /opt/data/oceanbase/d_dc2/obagent/conf/agentd.yaml
root 45512 45507 0 8月06 ? 00:11:25 /opt/data/oceanbase/d_dc2/obagent/bin/ob_monagent
root 45513 45507 0 8月06 ? 00:00:02 /opt/data/oceanbase/d_dc2/obagent/bin/ob_mgragent
[WARNING]: Platform linux on host 20-1 is using the discovered Python interpreter at /usr/bin/python3.7, but future installation of
another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-
core/2.15/reference_appendices/interpreter_discovery.html for more information.
20-1 | CHANGED | rc=0 >>
root 29311 1 99 8月06 ? 2-11:37:32 /opt/data/oceanbase/d_dc2/oceanbase/bin/observer -p 2881
root 34016 1 0 8月06 ? 00:00:52 bash /opt/data/oceanbase/d_dc2/obproxy/obproxyd.sh /opt/data/oceanbase/d_dc2/obproxy 10.106.48.20 2883 daemon
root 34051 1 5 8月06 ? 02:45:19 /opt/data/oceanbase/d_dc2/obproxy/bin/obproxy --listen_port 2883
root 35345 1 0 8月06 ? 00:00:11 /opt/data/oceanbase/d_dc2/obagent/bin/ob_agentd -c /opt/data/oceanbase/d_dc2/obagent/conf/agentd.yaml
root 35351 35345 0 8月06 ? 00:14:28 /opt/data/oceanbase/d_dc2/obagent/bin/ob_monagent
root 35352 35345 0 8月06 ? 00:00:02 /opt/data/oceanbase/d_dc2/obagent/bin/ob_mgragent
root 52475 52474 0 16:02 pts/5 00:00:00 /bin/sh -c ps -ef | grep -E “observer|obagent|obproxy”
root 52477 52475 0 16:02 pts/5 00:00:00 grep -E observer|obagent|obproxy
[WARNING]: Platform linux on host 20-2 is using the discovered Python interpreter at /usr/bin/python3.7, but future installation of
another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-
core/2.15/reference_appendices/interpreter_discovery.html for more information.
20-2 | CHANGED | rc=0 >>
root 52683 1 99 8月06 ? 6-20:18:40 /opt/data/oceanbase/d_dc2/oceanbase/bin/observer -p 2881
root 56164 1 0 8月06 ? 00:00:11 /opt/data/oceanbase/d_dc2/obagent/bin/ob_agentd -c /opt/data/oceanbase/d_dc2/obagent/conf/agentd.yaml
root 56169 56164 0 8月06 ? 00:00:02 /opt/data/oceanbase/d_dc2/obagent/bin/ob_mgragent
root 56170 56164 0 8月06 ? 00:12:58 /opt/data/oceanbase/d_dc2/obagent/bin/ob_monagent
root 57832 57828 0 16:02 pts/0 00:00:00 /bin/sh -c ps -ef | grep -E “observer|obagent|obproxy”
root 57834 57832 0 16:02 pts/0 00:00:00 grep -E observer|obagent|obproxy

进程都在,但就是客户端无法访问 。

1 个赞

观察日志暂时没有得到一些有用的信息,只有元数据问题,但不知应该如何处理。

正常集群启动ocp应该是可以启动的,但当前ocp一直因为元数据问题异常。

1 个赞

ocp-express这个东西已经不进行维护了。不要纠结这个启动失败问题了 :joy:
如果想单独启动ob数据库,麻烦提供一份observer日志看一下

1 个赞

就是上面帖子的日子,只有这些信息 ,日志中有提示:errcode=-4076,errcode=-5019,errcode=-4012,

可否提供一份涵盖重启ob失败的日志附件

observer.log.tail.5m.txt (5 MB)

Get local repositories and plugins ok
Load cluster param plugin ok
Open ssh connection ok
Cluster status check ok
Search plugins ok
Load cluster param plugin ok
Check before start observer ok
Check before start obproxy ok
Check before start obagent ok
Check before start ocp-express ok
Start observer ok
observer program health check ok
obshell program health check ok
Connect to observer 10.106.48.20:2881 ok
Start obproxy ok
obproxy program health check ok
Connect to obproxy ok
Start obagent ok
obagent program health check ok
Connect to Obagent ok
Start ocp-express /

上面是启动到ocp卡住,observer的日志,取了后面5M的。

从obd启动流程看就是ocp-express导致的。日志中未看到ob数据库启动异常问题。

虽说进程都启动成功,但是无法连接数据库,有什么办法可以规避ocp的问题,在不卸载或者重装ob的情况下 。