扩容CPU内存后OBserver启动不了,重装失败

【 使用环境 】生产环境 or 测试环境
【 OB or 其他组件 】OBserver
【 使用版本 】4.2.1.8
【问题描述】
主机扩容CPU内存,没有停服务,直接重启了OS。OBserver自动拉取失败,在OCP上尝试启动也失败。
现在使用OCP重装,报错

2024-09-29 10:37:02.817  WARN 8 --- [manual-subtask-executor15,6b1ac79c09c8ac26,b247d57100644110] c.o.ocp.obsdk.connector.ConnectTemplate  : [obsdk] update failed, sql:[alter system delete server ?], error message:[PreparedStatementCallback; SQL [alter system delete server ?]; (conn=2625562) Invalid argument; nested exception is java.sql.SQLTransientConnectionException: (conn=2625562) Invalid argument

手动执行也是报错。

 alter system delete server '1xxx:2882';
ERROR 1210 (HY000): Invalid argument

1F-1F-1F架构,3个zone,每个zone下只有一台服务器。

3台主机都是在线扩容了CPU,内存吗? 都重启失败吗?
麻烦在OCP下载完整的任务日志发下,observer.log也发下

只操作了一台,另外两台还没操作。

observer.log

[2024-09-29 10:16:55.979356] INFO  [SERVER.OMT] stop (ob_multi_tenant.cpp:593) [22553][observer][T0][YB420A3303AD-0006233874CF6D37-0-0] [lt=0] there're some tenants need destroy(count=1)
[2024-09-29 10:16:55.979519] INFO  [SERVER.OMT] remove_tenant (ob_multi_tenant.cpp:1678) [22553][observer][T0][YB420A3303AD-0006233874CF6D37-0-0] [lt=0] removed_tenant begin to kill tenant session(tenant_id=10
46)
》。。
[2024-09-29 10:16:56.567364] INFO  [SERVER.OMT] remove_tenant (ob_multi_tenant.cpp:1678) [22553][observer][T0][YB420A3303AD-0006233874CF6D37-0-0] [lt=0] removed_tenant begin to kill tenant session(tenant_id=1046)
[2024-09-29 10:16:56.567527] INFO  [SERVER.OMT] remove_tenant (ob_multi_tenant.cpp:1672) [22553][observer][T0][YB420A3303AD-0006233874CF6D37-0-0] [lt=0] removed_tenant begin to stop(tenant_id=1046)
[2024-09-29 10:16:56.568015] INFO  [SERVER.OMT] stop (ob_multi_tenant.cpp:593) [22553][observer][T0][YB420A3303AD-0006233874CF6D37-0-0] [lt=1] there're some tenants need destroy(count=1)
[2024-09-29 10:16:56.568992] INFO  [SERVER.OMT] remove_tenant (ob_multi_tenant.cpp:1672) [22553][observer][T0][YB420A3303AD-0006233874CF6D37-0-0] [lt=0] removed_tenant begin to stop(tenant_id=1046)
[2024-09-29 10:16:56.569155] INFO  [SERVER.OMT] remove_tenant (ob_multi_tenant.cpp:1672) [22553][observer][T0][YB420A3303AD-0006233874CF6D37-0-0] [lt=0] removed_tenant begin to stop(tenant_id=1046)
[2024-09-29 10:16:56.569481] INFO  [SERVER.OMT] remove_tenant (ob_multi_tenant.cpp:1678) [22553][observer][T0][YB420A3303AD-0006233874CF6D37-0-0] [lt=0] removed_tenant begin to kill tenant session(tenant_id=1046)

observer.log.wf

[2024-09-29 10:13:42.132285] ERROR issue_dba_error (ob_log.cpp:1875) [22648][TbltTblUp4][T0][YB420A3303AD-000623389F5F6E82-0-0] [lt=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4019, file="ob_tablet_table_updater.cpp", line_no=310, info="fail to reput to queue")

subtask_7039517.log (8.7 KB)

[2024-09-29 10:13:42.132285] ERROR issue_dba_error (ob_log.cpp:1875) [22648][TbltTblUp4][T0][YB420A3303AD-000623389F5F6E82-0-0] [lt=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4019, file=“ob_tablet_table_updater.cpp”, line_no=310, info=“fail to reput to queue”)

这个是重启失败的日志 是吧?能否发下包含上下文信息的完整日志?

delete server失败后,有试过黑屏操作吗?
ALTER SYSTEM DELETE SERVER ‘xxx.xx.xxx.xx1:xxxx’ ZONE ‘zone1’;

1 个赞

貌似好了 :sweat:

重装失败,我回滚了。自动启动,看到有个报时间相差太大的告警,我把服务器时间改了下,恢复了 :joy:

1 个赞