OMS任务显示停止,组件运行中

【 使用环境 】生产环境 or 测试环境
【 OB or 其他组件 】
【 使用版本 】OMS4.2
【问题描述】清晰明确描述问题
【复现路径】ob4.2版本同步到mysql,进行全量+增量同步,全量结束后任务显示停止


【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):

【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

/home/run/ds/{增量同步组件id}/logs 下面日志打包传一下


日志一共有600M,根本传不了

1、cd /home/run/ds/{增量同步组件id}
./connector_utils.sh diagnose ,麻烦把这个诊断信息发一下

2、oms 宿主机上是否有多个容器?

java -cp /home/ds/plugins/jdbc_connector/tools/connector-command.jar com.oceanbase.oms.connector.command.Command diagnose -t 10.14.10.241-9000:connector_v2:np_5nucighylodc-incr_trans-1-0:0000000009
11:24:15,781 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
11:24:15,781 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [jar:file:/u01/ds/plugins/jdbc_connector/tools/connector-command.jar!/logback.xml]
11:24:15,803 |-INFO in ch.qos.logback.core.joran.spi.ConfigurationWatchList@c39f790 - URL [jar:file:/u01/ds/plugins/jdbc_connector/tools/connector-command.jar!/logback.xml] is not of type file
11:24:15,873 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
11:24:15,884 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
11:24:15,886 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDOUT]
11:24:15,892 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
11:24:15,944 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.rolling.RollingFileAppender]
11:24:15,950 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [INFO]
11:24:15,958 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy@1911006827 - Will use gz compression
11:24:15,959 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy@1911006827 - Will use the pattern ./logs/connector-command.%d{yyyy-MM-dd}.log for the active file
11:24:15,962 |-INFO in c.q.l.core.rolling.DefaultTimeBasedFileNamingAndTriggeringPolicy - The date pattern is ‘yyyy-MM-dd’ from file name pattern ‘./logs/connector-command.%d{yyyy-MM-dd}.log.gz’.
11:24:15,962 |-INFO in c.q.l.core.rolling.DefaultTimeBasedFileNamingAndTriggeringPolicy - Roll-over at midnight.
11:24:15,965 |-INFO in c.q.l.core.rolling.DefaultTimeBasedFileNamingAndTriggeringPolicy - Setting initial period to Fri Apr 19 11:24:15 CST 2024
11:24:15,968 |-WARN in ch.qos.logback.core.rolling.RollingFileAppender[INFO] - This appender no longer admits a layout as a sub-component, set an encoder instead.
11:24:15,968 |-WARN in ch.qos.logback.core.rolling.RollingFileAppender[INFO] - To ensure compatibility, wrapping your layout in LayoutWrappingEncoder.
11:24:15,968 |-WARN in ch.qos.logback.core.rolling.RollingFileAppender[INFO] - See also http://logback.qos.ch/codes.html#layoutInsteadOfEncoder for details
11:24:15,969 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[INFO] - Active log file name: ./logs/connector-command.log
11:24:15,969 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[INFO] - File property is set to [./logs/connector-command.log]
11:24:15,970 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [stdout] to INFO
11:24:15,970 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [stdout] to false
11:24:15,970 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[stdout]
11:24:15,971 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to INFO
11:24:15,971 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [INFO] to Logger[ROOT]
11:24:15,971 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
11:24:15,971 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@2ac1fdc4 - Registering current configuration as safe fallback point

2024-04-19 11:24:20 INFO status:FINISHED
[Metrics]
[DataFlow]

[Delay]
delay:67334000ms,sourceDelay:67328095ms
sinkDelay:67328127ms
delayTrend:67236s->67244s->67252s->67259s->67266s->67273s->67280s->67287s->67294s->67303s->67312s->67319s->67328s

[ERROR_LOG]
STOPPED

[GcMetrics]
youngMem:16384M,fullMem:16384M,heap:20023M/32768M,noHeapUsed:65M
[youngGc-ParNew]costMsAvg:0.34,countPreSec:0.01
[fullGc-ConcurrentMarkSweep]costMsAvg:0.0,countPreSec:0.0

[Kafka]

[COORDINATOR_QUEUE]
waitRecords:404.0,readyBatch:364.0,totalRecordsInQueue:405.0,sourceRecordAccumulate:0.0
sourceBlock:true,sinkBlock:false

[RPS]
Rps[last:280.0,avg:680.0]
Iops[last:3.06M,avg:7.5M]
RPS_TREND[195->829->552->1383->835->831->1109->776->568->557->364->569->280]

[SinkMetrics]
execute/Record:6.0ms,commit/Batch:2.0ms,sinkBatchSizeAvg:1.11
execute(p99)/Record:6.0ms,commit(p99)/Batch:3.0ms
lastSinkThread:0/32
shardTime:0.0ms

[store]
OB_MYSQL_CE_np_5nucighylodc_5nucizi4apds-1-0:rps:5313.044,iops:63.759M,delay:80.621s,connNum:1

[Scene] StoreParserSlowScene
prev:source.useBetaListener=null
post:source.useBetaListener=true
affect:使用 LogMessage 加速解析,减少中间对象
comment:加速解析
prev:source.useSchemaCache=null
post:source.useSchemaCache=true
affect:使用 Schema 缓存,减少中间对象
comment:使用 Schema 缓存

在增量中设置参数
source中设置
useBetaListener=true
useSchemaCache=true

jvm内存适当调大

错误码:

CONNECTOR-ALAL02000500

等级:

ERROR

错误信息:

com.oceanbase.oms.store.client.HttpBadResponseException: CmBaseException(oldErrorCode=0, errMsg=Found no active stores under topic [OB_MYSQL_CE_np_5nucighylodc_5nucizi4apds-1-0], reason=null, relativeDocs=null)

错误原因:

Unknown

解决方案:

请联系技术支持人员

修改以后再恢复就是这样

查看组件监控看一下,store组件停了吗?

这几个都异常了,起不来

一直报上面那个错误

先看一下store为啥起不来,/home/ds/store/store{port}/log 下面看一下面看一下错误日志,看上去很可能是OMS资源不够了

只有这几行
2024-04-18 14:05:57 (ConnectionManager.cpp:77): default buf size is: 1048576, Msg split size is: 131072, Force Use Encrypt: false, Using ThreadPool true
2024-04-18 14:05:57 (ConnectionManager.cpp:139): listen info is: 17004
2024-04-18 14:05:57 (Epoll.cpp:94): Epoll start, use FD: 10
2024-04-18 14:05:57 (Epoll.cpp:94): Epoll start, use FD: 11

store进程到底在不在:ps -ef|grep store{port}
cdc日志呢?

进程还在,日志太大发不上来