【 使用环境 】生产环境
【 OB or 其他组件 】
【 使用版本 】ocp:4.3.2,oceanbase-ce:4.3.3.1
【问题描述】通过obd cluster restart xx 重启集群后,ocp界面无法访问,启动失败;已经第二次遇到,之前是测试数据就重新部署了;
【复现路径】操作前开启过全链接的2个参数;
obd日志
ocp
附件是obd日志和ocp日志
obd.txt (379.6 KB)
ocp-server.log (25.1 MB)
【 使用环境 】生产环境
【 OB or 其他组件 】
【 使用版本 】ocp:4.3.2,oceanbase-ce:4.3.3.1
【问题描述】通过obd cluster restart xx 重启集群后,ocp界面无法访问,启动失败;已经第二次遇到,之前是测试数据就重新部署了;
【复现路径】操作前开启过全链接的2个参数;
附件是obd日志和ocp日志
obd.txt (379.6 KB)
ocp-server.log (25.1 MB)
提供一份yaml文件看看 ~/.obd/cluster/xxxx/
你的部署架构是怎样的呢?ocp是部署在153上吗?有大量数据源连接失败,
172.16.207.152:8883
172.16.207.153:8881
ocp_meta租户和ocp_monitor租户手动连接可以成功吗?
另外 observer.log也麻烦发下
2024-11-12 10:53:49.131 ERROR 49711 --- [Druid-ConnectionPool-Create-1400974072,,] com.alibaba.druid.pool.DruidDataSource : create connection SQLException, url: jdbc:oceanbase://172.16.207.153:8881/oceanbase?useUnicode=true&characterEncoding=UTF8&encloseParamInParentheses=false, errorCode -1, state 08000
java.sql.SQLNonTransientConnectionException: Could not connect to HostAddress{host='172.16.207.153', port=8881}. 拒绝连接 (Connection refused)
at com.oceanbase.jdbc.internal.util.exceptions.ExceptionFactory.createException(ExceptionFactory.java:122)
at com.oceanbase.jdbc.internal.util.exceptions.ExceptionFactory.create(ExceptionFactory.java:225)
at com.oceanbase.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1735)
at com.oceanbase.jdbc.internal.util.Utils.retrieveProxy(Utils.java:1431)
at com.oceanbase.jdbc.OceanBaseConnection.newConnection(OceanBaseConnection.java:311)
at com.oceanbase.jdbc.Driver.connect(Driver.java:89)
at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1657)
at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1723)
at com.alibaba.druid.pool.DruidDataSource$CreateConnectionThread.run(DruidDataSource.java:2838)
Caused by: java.net.ConnectException: 拒绝连接 (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.oceanbase.jdbc.internal.protocol.AbstractConnectProtocol.createSocket(AbstractConnectProtocol.java:285)
at com.oceanbase.jdbc.internal.protocol.AbstractConnectProtocol.createConnection(AbstractConnectProtocol.java:560)
at com.oceanbase.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1715)
... 6 common frames omitted
……
2024-11-12 10:59:36.774 ERROR 49711 --- [ocp-async-6,0c703dd89381762a,0516bff573ac5c96] o.h.engine.jdbc.spi.SqlExceptionHelper : Could not connect to 172.16.207.152:8883 : (conn=1210075265) Server is initializing
2024-11-12 10:59:36.775 WARN 49711 --- [ocp-async-2,b0439f6850cf1555,e8070a105b44bdb3] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: 08004
2024-11-12 10:59:36.775 ERROR 49711 --- [ocp-async-2,b0439f6850cf1555,e8070a105b44bdb3] o.h.engine.jdbc.spi.SqlExceptionHelper : metadb-connect-pool - Connection is not available, request timed out after 2000ms.
2024-11-12 10:59:36.775 WARN 49711 --- [ocp-async-2,b0439f6850cf1555,e8070a105b44bdb3] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 8001, SQLState: 08004
2024-11-12 10:59:36.775 ERROR 49711 --- [ocp-async-2,b0439f6850cf1555,e8070a105b44bdb3] o.h.engine.jdbc.spi.SqlExceptionHelper : Could not connect to 172.16.207.152:8883 : (conn=1210075265) Server is initializing
单独启动该组件试试看
obd cluster start xxxx -c ocp-server-ce
每台节点内存多大,我看你proxy设置了80G有点浪费了
单独重启也是一样,我看了之前类似的贴子,也单独重启过了,还是起不来;服务器单台内存185G
需要我把内存调小点,在单独重启试试吗
现在看是正常的,你单独重启ocp-server看下
obd cluster start xxxx -c ocp-server-ce
有配置,这几天没有操作hosts和修改主机名操作
我联系ocp的老师分析下,有进展回复你
好的,感谢
感谢老师,已经解决了;目前已经恢复了,我来回重启了多次发现目前能正常操作了;
解决方法:修改了ocp.analyze.enabled和ocp.analyze.ob.trace.enabled,之前开启了,我把他们关闭了,重启集群后正常访问ocp
这个问题我们继续分析下,感谢反馈
好的老师,也麻烦看看这2个参数,应该是开启 租户全链路追踪配置的必要参数;