【 使用环境 】生产环境
生产内部项目 OceanBase_CE 4.2.1.0 (r100000102023092807-7b0f43693565654bb1d7343f728bc2013dfff959) (Built Sep 28 2023 07:25:28)
【 OB or 其他组件 】
obd --version
OceanBase Deploy: 2.4.1
REVISION: 955d1eab27a5bd304669b6280c88dc4102c07bb4
【问题描述】
- 使用OBD黑屏方式升级4.2.1版本至4.2.2.0版本卡住,一直无反应,2.20号升级后一直卡住(该集群能
正常读写) - OCP管理平台无法打开(可能也和升级卡住有关联)
http://ob-ocp-xxxx.dmall.com/
2024-02-20 15:59:03 obd cluster upgrade oceanbase41 -c oceanbase-ce -V 4.2.2.0 --usable=d687aabed34f610040c70cd8aa4f256f9a909564bcdb12e1bcbf83224c865fab
之前使用OBD从4.1.0升级至4.2.1版本正常
2023-10-25 20:37:27 obd cluster upgrade oceanbase41 -c oceanbase-ce -V 4.2.1.0 --usable=8f0cac8e81aaef587efb774b5de3cd98876dc196ccb8f2eca7bcd252f48ffb4a
obd cluster display oceanbase41
Deploy "oceanbase41" is upgrading
See https://www.oceanbase.com/product/ob-deployer/error-codes .
Trace ID: 91b2297e-d779-11ee-924c-525400b51421
If you want to view detailed obd logs, please run: obd display-trace 91b2297e-d779-11ee-924c-525400b51421
obd display-trace c1279bda-d779-11ee-a1db-525400b51421
obd display-trace c1279bda-d779-11ee-a1db-525400b51421
[2024-03-01 11:14:02.668] [DEBUG] - cmd: ['oceanbase41']
[2024-03-01 11:14:02.668] [DEBUG] - opts: {'components': 'oceanbase-ce', 'style': 'cluster'}
[2024-03-01 11:14:02.668] [DEBUG] - mkdir /root/.obd/lock/
[2024-03-01 11:14:02.668] [DEBUG] - unknown lock mode
[2024-03-01 11:14:02.668] [DEBUG] - try to get share lock /root/.obd/lock/global
[2024-03-01 11:14:02.668] [DEBUG] - share lock `/root/.obd/lock/global`, count 1
[2024-03-01 11:14:02.668] [DEBUG] - Get Deploy by name
[2024-03-01 11:14:02.668] [DEBUG] - mkdir /root/.obd/cluster/
[2024-03-01 11:14:02.669] [DEBUG] - mkdir /root/.obd/config_parser/
[2024-03-01 11:14:02.669] [DEBUG] - try to get exclusive lock /root/.obd/lock/deploy_oceanbase41
[2024-03-01 11:14:02.669] [DEBUG] - exclusive lock `/root/.obd/lock/deploy_oceanbase41`, count 1
[2024-03-01 11:14:02.675] [DEBUG] - Deploy config status judge
[2024-03-01 11:14:02.675] [ERROR] Deploy oceanbase41 need reload
[2024-03-01 11:14:02.675] [INFO] See https://www.oceanbase.com/product/ob-deployer/error-codes .
[2024-03-01 11:14:02.675] [INFO] Trace ID: c1279bda-d779-11ee-a1db-525400b51421
[2024-03-01 11:14:02.675] [INFO] If you want to view detailed obd logs, please run: obd display-trace c1279bda-d779-11ee-a1db-525400b51421
[2024-03-01 11:14:02.675] [DEBUG] - exclusive lock /root/.obd/lock/deploy_oceanbase41 release, count 0
[2024-03-01 11:14:02.675] [DEBUG] - unlock /root/.obd/lock/deploy_oceanbase41
[2024-03-01 11:14:02.675] [DEBUG] - share lock /root/.obd/lock/global release, count 0
[2024-03-01 11:14:02.675] [DEBUG] - unlock /root/.obd/lock/global
obd cluster reload oceanbase41
[ERROR] Deploy "oceanbase41" is upgrading. You could not reload an upgrading cluster.
See https://www.oceanbase.com/product/ob-deployer/error-codes .
Trace ID: f9efd25a-d780-11ee-aec4-525400b51421
If you want to view detailed obd logs, please run: obd display-trace f9efd25a-d780-11ee-aec4-525400b51421
xxx.210 机器升级前硬件维护过,重启过机器,人工启动过observer和obproxy(强力建议OB官方做基于systemd的自启动服务,某db也是这么做的)
该210 observer节点查询状态正常
210 observer无异常错误日志
grep ’ ERROR ’ /home/admin/oceanbase41/oceanbase/log/observer.log
ps -aux | egrep -w 'observer'
admin 70881 191 20.1 55220288 53195956 ? Ssl Feb19 28848:36 ./bin/observer
*************************** 3. row ***************************
gmt_create: 2023-07-04 17:50:14.605122
gmt_modified: 2024-02-19 23:47:37.050818
svr_ip: xxx.210
svr_port: 2882
id: 3
zone: zone3
inner_port: 2881
with_rootserver: 0
status: ACTIVE
block_migrate_in_time: 0
build_version: 4.2.1.0_100000102023092807-7b0f43693565654bb1d7343f728bc2013dfff959(Sep 28 2023 07:25:28)
stop_time: 0
start_service_time: 1708357655340830
first_sessid: 0
with_partition: 1
last_offline_time: 0
3 rows in set (0.01 sec)
docker中OCP日志:
docker exec -it 0f6e907ac7f6 /bin/bash
大量报错日志
grep ‘Operation not allowed’ logs/ocp-server.0.err | wc -l
30681
less logs/ocp-server.0.err
Caused by: java.lang.IllegalArgumentException: Cannot instantiate interface org.springframework.boot.SpringApplicationRunListener : com.oceanbase.ocp.bootstrap.spring.BootstrapRunListener
at org.springframework.boot.SpringApplication.createSpringFactoriesInstances(SpringApplication.java:461)
at org.springframework.boot.SpringApplication.getSpringFactoriesInstances(SpringApplication.java:443)
at org.springframework.boot.SpringApplication.getRunListeners(SpringApplication.java:431)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:297)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1317)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1306)
at com.oceanbase.ocp.OcpServerApplication.main(OcpServerApplication.java:21)
... 8 more
Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.oceanbase.ocp.bootstrap.spring.BootstrapRunListener]: Constructor threw exception; nested exception is java.lang.
IllegalStateException: init distributed_lock table failed
at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:224)
at org.springframework.boot.SpringApplication.createSpringFactoriesInstances(SpringApplication.java:457)
... 14 more
Caused by: java.lang.IllegalStateException: init distributed_lock table failed
at com.oceanbase.ocp.bootstrap.hooks.BootstrapLock.init(BootstrapLock.java:69)
at com.oceanbase.ocp.bootstrap.hooks.BootstrapLock.tryLock(BootstrapLock.java:83)
at com.oceanbase.ocp.bootstrap.hooks.OCPInitializer.initialize(OCPInitializer.java:58)
at com.oceanbase.ocp.bootstrap.hooks.OCPInitializer.initialize(OCPInitializer.java:115)
at com.oceanbase.ocp.bootstrap.spring.BootstrapRunListener.<init>(BootstrapRunListener.java:56)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:211)
【建议】
- 强力建议OB官方做基于systemd的自启动服务,用户自己做自启动的话没有一个标准不好实现
- 官方文档有更多关于黑屏方式应急恢复方面的说明