obproxy宕机

【 使用环境 】生产环境
【 OB or 其他组件 】ocp obproxy
【 使用版本 】
【问题描述】obproxy宕机,ocp上显示最后可用时间和当前可用时间,相差5分钟
【复现路径】问题出现前后相关操作
【附件及日志】

全部日志
2025-03-10 14:54:32.887 INFO 17347 — [pool-manual-subtask-executor14,1b5431261f6a46d2,9a991663de50] c.o.o.c.t.e.runner.JavaSubtaskRunner : Run subtask, id=114215, context=Context{parallelIdx=0, stringMap={former_obproxy_cluster_status=UNLOCK, subtask_splitter=obproxy_server_ids, ocpagent_service_name=proxy_agent, target_obproxy_status=STOPPED, task_instance_id=109664, target_operate_status=NORMAL, task_operation=execute, latest_execution_start_time=2025-03-10T14:54:32.802+08:00, sub_task_instance_name=Inactive agent modules, sub_task_instance_id=114215, obproxy_cluster_id=2, target_obproxy_cluster_status=UNLOCK}, listMap={obproxy_server_ids=[6], host_ids=[2]}}, executor=10.38.36.237

2025-03-10 14:54:32.891 INFO 17347 — [pool-manual-subtask-executor14,1b5431261f6a46d2,9a991663de50] c.o.o.s.t.b.o.InactiveAgentModuleTask : inactive agent modules begin

2025-03-10 14:54:32.952 INFO 17347 — [pool-manual-subtask-executor14,1b5431261f6a46d2,9a991663de50] c.o.o.c.agent.HostAgentServiceImpl : Finding OCP agent: hostId=2

2025-03-10 14:54:32.969 INFO 17347 — [pool-manual-subtask-executor14,1b5431261f6a46d2,9a991663de50] c.o.o.c.a.p.HostAgentProcessServiceImpl : Getting all OCP agent processes on host 2

2025-03-10 14:54:33.101 INFO 17347 — [pool-manual-subtask-executor14,1b5431261f6a46d2,9a991663de50] c.o.o.c.agent.HostAgentServiceImpl : get hostServiceType OB_PROXY, exporters {http://10.38.36.239:62889/metrics/obproxy=1, http://10.38.36.239:62889/metrics/node/obproxy=1}

2025-03-10 14:54:33.129 INFO 17347 — [pool-manual-subtask-executor14,1b5431261f6a46d2,9a991663de50] c.o.o.c.agent.HostAgentServiceImpl : Inactiving OCP monitor module, host=2, mode=OB_PROXY_DEPLOYED

2025-03-10 14:54:33.142 INFO 17347 — [pool-manual-subtask-executor14,1b5431261f6a46d2,9a991663de50] c.o.o.c.agent.HostAgentServiceImpl : Finding OCP agent: hostId=2

2025-03-10 14:54:33.149 INFO 17347 — [pool-manual-subtask-executor14,1b5431261f6a46d2,9a991663de50] c.o.o.c.a.p.HostAgentProcessServiceImpl : Getting all OCP agent processes on host 2

2025-03-10 14:54:33.198 INFO 17347 — [pool-manual-subtask-executor14,1b5431261f6a46d2,9a991663de50] c.o.o.e.internal.template.HttpTemplate : POST request to agent, url:http://10.38.36.239:62888/api/v1/module/config/update, request body:UpdateAgentConfigRequest(agentHome=null, configs=[(monagent.pipeline.obproxy.status=inactive)]), params:null

2025-03-10 14:54:33.206 ERROR 17347 — [pool-manual-subtask-executor14,1b5431261f6a46d2,9a991663de50] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : [AgentClient]:http request is failed, response:Authentication failed for wrong digest

com.oceanbase.ocp.executor.exception.HttpRequestFailedException: [AgentClient]:http request is failed, response:Authentication failed for wrong digest
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.checkSuccess(HttpTemplate.java:476)
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.doPost(HttpTemplate.java:286)
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.post(HttpTemplate.java:114)
at com.oceanbase.ocp.executor.executor.AgentExecutor.updateAgentConfig(AgentExecutor.java:185)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl.configAgent(HostAgentConfigManagerImpl.java:467)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl.configAgent(HostAgentConfigManagerImpl.java:473)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl.stopObProxyMonitor(HostAgentConfigManagerImpl.java:432)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl$$FastClassBySpringCGLIB$$798eb4cd.invoke()
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy.invokeMethod(CglibAopProxy.java:386)
at org.springframework.aop.framework.CglibAopProxy.access$000(CglibAopProxy.java:85)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:704)
at com.oceanbase.ocp.compute.agent.manager.HostAgentConfigManagerImpl$$EnhancerBySpringCGLIB$$a21ae10f.stopObProxyMonitor()
at com.oceanbase.ocp.compute.agent.HostAgentServiceImpl.inactiveObProxyModule(HostAgentServiceImpl.java:831)
at com.oceanbase.ocp.compute.agent.HostAgentServiceImpl.InactiveAgentModule(HostAgentServiceImpl.java:437)
at com.oceanbase.ocp.compute.agent.HostAgentServiceImpl$$FastClassBySpringCGL
IB$$e7947c56.invoke()
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy.invokeMethod(CglibAopProxy.java:386)
at org.springframework.aop.framework.CglibAopProxy.access$000(CglibAopProxy.java:85)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:704)
at com.oceanbase.ocp.compute.agent.HostAgentServiceImpl$$EnhancerBySpringCGLIB$$bcbcb54a.InactiveAgentModule()
at com.oceanbase.ocp.service.task.business.obagent.InactiveAgentModuleTask.run(InactiveAgentModuleTask.java:54)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.execute(JavaSubtaskRunner.java:64)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:32)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26)
at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:76)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.doRun(SubtaskExecutor.java:203)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.redirectConsoleOutput(SubtaskExecutor.java:197)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.lambda$submit$2(SubtaskExecutor.java:134)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Set state for subtask: 114215, operation:EXECUTE, state: FAILED

1 个赞

obproxy宕机会影响到ob业务的,看一下是否业务有问题,有可能是agent有问题导致的节点断连问题
麻烦ps -ef看一下ocp_monagent进程是否还存在,如果不存在试一下重启主机agent。

1 个赞

1 个赞

截图中ocp执行的任务是什么。这里看你跳过这个任务了
image
目前obproxy状态是什么样子,ocp上显示最后可用时间和当前可用时间,相差5分钟这句话是告警中显示的么

1 个赞

1、obproxy的版本号是多少?
2、查一下系统信息,
cat /etc/redhat-release

cat /etc/issue
3、看一下obproxy的目录下是否有core文件生成
4、看一下core信息设置
[sudo] sysctl -p | grep core_pattern
ps -ef | grep obproxy --看一下是哪个用户部署的
su - 用户名 --切换用户
查一下系统参数
ulimit -a

1 个赞


原先是停止中,后来我看机器上进程已经没了,我就跳过了

1 个赞

4.2.1.0-11
Red Hat Enterprise Linux Server release 7.9 Beta (Maipo)



1 个赞


当前启动任务是卡在这里了吧?

我还原一下问题场景,您看是否有不对的地方 补充下
1.发现提示ocp上显示“最后可用时间和当前可用时间相差超过5分钟”

2.在ocp上重启obproxy

3.发现卡住,然后跳过了stop obproxy步骤

4.Inactive agent步骤报失败

1 个赞

原先的任务叫停止中,好像是自动发起的

是的

好的,麻烦将完整的任务日志下载发下,同时发下ocp-server.log和obproxy.log


重试了之后,现在在启动中

[AgentClient]:http request is failed, response:Authentication failed for wrong digest

025-03-11 15:30:49.816 INFO 17347 — [pool-manual-subtask-executor14,639d9938afc340f1,977a78353b99] c.o.o.c.t.e.runner.JavaSubtaskRunner : Retry subtask, id=172816, context=Context{parallelIdx=0, stringMap={former_obproxy_cluster_status=UNLOCK, subtask_splitter=obproxy_server_ids, ocpagent_service_name=proxy_agent, target_obproxy_status=RUNNING, task_instance_id=162719, target_operate_status=NORMAL, task_operation=retry, latest_execution_start_time=2025-03-11T15:30:49.400+08:00, sub_task_instance_name=Start obproxy process, sub_task_instance_id=172816, obproxy_cluster_id=2, target_obproxy_cluster_status=UNLOCK}, listMap={obproxy_server_ids=[6], cluster_ids=[2], host_ids=[2]}}, executor=10.38.36.237

2025-03-11 15:30:49.825 INFO 17347 — [pool-manual-subtask-executor14,639d9938afc340f1,977a78353b99] c.o.o.o.internal.task.StartObproxyTask : Need to stop obproxy when rollback.

2025-03-11 15:30:49.848 INFO 17347 — [pool-manual-subtask-executor14,639d9938afc340f1,977a78353b99] c.o.o.c.agent.HostAgentServiceImpl : Finding OCP agent: hostId=2

2025-03-11 15:30:49.863 INFO 17347 — [pool-manual-subtask-executor14,639d9938afc340f1,977a78353b99] c.o.o.c.a.p.HostAgentProcessServiceImpl : Getting all OCP agent processes on host 2

2025-03-11 15:30:50.065 INFO 17347 — [pool-manual-subtask-executor14,639d9938afc340f1,977a78353b99] c.o.o.e.internal.template.HttpTemplate : POST request to agent, url:http://10.38.36.239:62888/api/v1/process/stop, request body:StopProcessRequest(process=FindProcessParam(findType=BY_NAME, name=obproxy, keyword=null, pid=null), force=true), params:null

2025-03-11 15:30:50.073 ERROR 17347 — [pool-manual-subtask-executor14,639d9938afc340f1,977a78353b99] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : [AgentClient]:http request is failed, response:Authentication failed for wrong digest

com.oceanbase.ocp.executor.exception.HttpRequestFailedException: [AgentClient]:http request is failed, response:Authentication failed for wrong digest
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.checkSuccess(HttpTemplate.java:476)
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.doPost(HttpTemplate.java:286)
at com.oceanbase.ocp.executor.internal.template.HttpTemplate.postForNone(HttpTemplate.java:135)
at com.oceanbase.ocp.executor.executor.AgentExecutor.stopProcess(AgentExecutor.java:255)
at com.oceanbase.ocp.compute.host.service.HostOperationServiceImpl.forceStopProcessByName(HostOperationServiceImpl.java:439)
at com.oceanbase.ocp.obproxyops.internal.task.StartObproxyTask.rollback(StartObproxyTask.java:107)
at com.oceanbase.ocp.core.task.runtime.Subtask.retry(Subtask.java:49)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.retry(JavaSubtaskRunner.java:76)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:35)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26)
at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:76)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.doRun(SubtaskExecutor.java:203)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.redirectConsoleOutput(SubtaskExecutor.java:197)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.lambda$submit$2(SubtaskExecutor.java:134)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.T
hread.run(Thread.java:748)

Set state for subtask: 172816, operation:RETRY, state: FAILED

麻烦提供一份ocp的日志。
一般在 ocp-server日志: /home/admin/ocp/log中,ocp-server.log

ocp-server.log.2025-03-11.60.7z (26.6 MB)

ocp-server.log.2025-03-11.61_2.7z (26.3 MB)

ocp-server.log.2025-03-11.65.7z (25.2 MB)
这个是当时任务的ocp日志

麻烦发下OCP版本

本号: 4.2.2-20240315150922