使用 OCP 升级数据库报错

使用 ocp 升级数据库(3.1.4 升级至 3.1.5版本)
参考:https://www.oceanbase.com/docs/community-observer-cn-10000000000450210
涉及到的包:

oceanbase-ce-3.1.5-100000252023041721.el7.x86_64.rpm
oceanbase-ce-utils-3.1.5-100000252023041721.el7.x86_64.rpm
oceanbase-ce-libs-3.1.5-100000252023041721.el7.x86_64.rpm
合并完成后,进行oB集群 滚动升级。大概20分钟后发现子任务有超时情况。

信息如下:

2023-04-26 16:17:59.645 INFO 873 — [pool-manual-subtask-executor1,14d4a80f3b4c4173,3aea6705b343] c.o.o.e.internal.template.HttpTemplate : POST request to agent, url:
http://172.17.151.122:62888/api/v1/file/exists
, request body:GetFileExistsRequest(filePath=rpms/extract/oceanbase-ce-3.1.5-100000252023041721.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_checker.py), params:null
2023-04-26 16:17:59.647 INFO 873 — [pool-manual-subtask-executor1,14d4a80f3b4c4173,3aea6705b343] c.a.o.c.m.t.model.SubtaskInstanceEntity : Set state for subtask: 8156980, current state: RUNNING, new state: FAILED
2023-04-26 16:17:59.649 WARN 873 — [pool-manual-subtask-executor1,14d4a80f3b4c4173,3aea6705b343] c.a.o.c.t.engine.runner.RunnerFactory : Execute task failed, subtask=SubtaskInstanceEntity{id=8156980, name=Execute upgrade checker script, state=FAILED, operation=RETRY, className=com.alipay.ocp.service.task.business.cluster.ExecUpgradeCheckerScriptTask, seriesId=14, startTime=2023-04-26T16:17:59.309+08:00, endTime=2023-04-26T16:17:59.648+08:00}
javax.ws.rs.ProcessingException: org.apache.http.conn.HttpHostConnectException: Connect to 172.17.151.122:62888 [/172.17.151.122] failed: Connection refused (Connection refused)
at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:528) ~[jersey-apache-connector-2.30.1.jar!/:na]
at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:296) ~[jersey-client-2.30.1.jar!/:na]
at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$2(JerseyInvocation.java:643) ~[jersey-client-2.30.1.jar!/:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:292) ~[jersey-common-2.30.1.jar!/:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:274) ~[jersey-common-2.30.1.jar!/:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:205) ~[jersey-common-2.30.1.jar!/:na]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390) ~[jersey-common-2.30.1.jar!/:na]
at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:641) ~[jersey-client-2.30.1.jar!/:na]
at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:443) ~[jersey-client-2.30.1.jar!/:na]
at org.glassfish.jersey.client.JerseyInvocation$Builder.post(JerseyInvocation.java:339) ~[jersey-client-2.30.1.jar!/:na]
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.lambda$doPost$1(HttpTemplate.java:247) ~[command-executor-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.safeExecute(HttpTemplate.java:404) ~[command-exec
utor-3.3.0-20220427.jar!/:3.3.0-20220427]

=============================================

ocp-server.0.err 日志中显示:

/home/admin/ocp-server/bin/ocp-server: line 106: 2347 Killed /usr/lib/jvm/java-1.8.0/bin/java -server -XX:+UseG
1GC -Xms2150m -Xmx2150m -Xss512k -XX:+PrintCommandLineFlags -XX:MetaspaceSize=1024m -XX:MaxMetaspaceSize=1024m -XX:+PrintAdaptive
SizePolicy -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -Xloggc:/home/admin/ocp-server/bin/…/lo
g/gc.log -XX:+UseGCLogFileRotation -XX:GCLogFileSize=50M -XX:NumberOfGCLogFiles=2 -XX:ErrorFile=/home/admin/ocp-server/bin/…/log
/hs_err_pid%p.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/admin/ocp-server/bin/…/log/ -Dfile.encoding=UTF-8 -jar
/home/admin/ocp-server/bin/…/lib/ocp-server-3.3.0-20220427.jar
Exception in thread “table-cleaning” org.springframework.dao.TransientDataAccessResourceException: StatementCallback; SQL [delete
/*+ QUERY_TIMEOUT(10000000) */ from task_instance where create_time <= date_sub(now(), interval 30 day) and type = “SYS_SCHEDULE
D” limit 1000;]; (conn=1721916) Timeout; nested exception is java.sql.SQLTransientConnectionException: (conn=1721916) Timeout
at org.springframework.jdbc.support.SQLExceptionSubclassTranslator.doTranslate(SQLExceptionSubclassTranslator.java:70)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslat
or.java:72)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslat
or.java:81)

执行checker脚本失败了,任务是失败在这一步吧,这一步执行了多久呢,看下面的日志又像是超时了,方便的话,可以把任务日志下载下来发出来

log_task_8140794.zip (17.5 KB)
附件是 我再次执行的日志

javax.ws.rs.ProcessingException: org.apache.http.conn.HttpHostConnectException: Connect to 172.17.151.122:62888 [/172.17.151.122] failed: Connection refused (Connection refused)
这个在主机管理那里看是什么状态,应该是连不上这个机器上的agent