oms反向同步延迟问题

【 使用环境 】生产环境
【 OB or 其他组件 】oms
【 使用版本 】ob版本为社区版4.2.5.5和社区版4.2.5.1,oms版本为4.2.11_CE
【问题描述】
源端为ob社区版4.2.5.1
目标端为ob版本为社区版4.2.5.5
现在已经将4.2.5.1上的一个租户通过OMS同步到了4.2.5.5上,并且正向同步已经切换完毕,当开启反向同步后,一直提示反向同步延迟


进入incr-sync组件执行./connector_utils.sh diagnose提示如下

2026-05-11 16:18:51 INFO status:FINISHED
[Metrics]
[DataFlow]

[Delay]
delay:4634000ms,sourceDelay:4131808ms
sinkDelay:4139673ms
delayTrend:4159s->4159s->4159s->4144s->4147s->4149s->4149s->4139s->4129s->4132s->4131s->4139s->4139s

[ERROR_LOG]
STOPPED

[GcMetrics]
youngMem:2048M,fullMem:2048M,heap:1652M/4096M,noHeapUsed:76M
[youngGc-ParNew]costMsAvg:0.24,countPreSec:0.02
[fullGc-ConcurrentMarkSweep]costMsAvg:0.0,countPreSec:0.0

[Kafka]

[COORDINATOR_QUEUE]
waitRecords:2.0,readyBatch:0.0,totalRecordsInQueue:2.0,sourceRecordAccumulate:0.0
sourceBlock:true,sinkBlock:false

[RPS]
Rps[last:0.0,avg:526.0]
Iops[last:0.0M,avg:1.05M]
RPS_TREND[401->401->0->1293->108->616->0->401->1219->201->94->0->0]

[SinkMetrics]
execute/Record:2.0ms,commit/Batch:2.0ms,sinkBatchSizeAvg:2.45
execute(p99)/Record:3.0ms,commit(p99)/Batch:4.0ms
lastSinkThread:0/64
shardTime:0.0ms

[store]
OB_MYSQL_CE_np_7qf49yeiiujk_reverse_7qhcsiveobxc-1-0:rps:139395.512,iops:217.354M,delay:4131.411s,connNum:1

[Scene] StoreParserSlowScene
可能存在源端投递数据瓶颈

执行./connector_utils.sh metrics,提示如下
2026-05-11 16:19:16 INFO
2026-05-11 16:18:56.919
SOURCE: [RPS:1733.35, IOPS:4.18M, delay:4117670ms]
SINK: [RPS:2788.72, TPS:1135.73, IOPS:5.807M, delay:4117695ms]
SINK_TIME: [execute_time:2.25ms/record, commit_time:3.93ms/batch]
SINK_SLOW_ROUTES:
SINK_THREAD: 13/64
DISPATCHER: wait record:19, ready batch:5, shardTime:nullms/record
forward_slot0 batchAccumulate: 0, recordAccumulate: 0
queue_slot1 batchAccumulate: 0, recordAccumulate: 0
heap:1557M/3891M, noHeap:76M/488M, threadCount:81, cpu:2.495, sysCpu:64.869
ParNew(count:51, cost:1199) ConcurrentMarkSweep(count:0, cost:0)

2026-05-11 16:19:16 INFO
2026-05-11 16:19:06.919
SOURCE: [RPS:446.2, IOPS:1.032M, delay:4112695ms]
SINK: [RPS:92.06, TPS:251.3, IOPS:0.193M, delay:4114940ms]
SINK_TIME: [execute_time:2.25ms/record, commit_time:3.21ms/batch]
SINK_SLOW_ROUTES:
SINK_THREAD: 0/64
DISPATCHER: wait record:0, ready batch:0, shardTime:nullms/record
forward_slot0 batchAccumulate: 0, recordAccumulate: 0
queue_slot1 batchAccumulate: 0, recordAccumulate: 0
heap:305M/3891M, noHeap:76M/488M, threadCount:81, cpu:0.889, sysCpu:39.848
ParNew(count:52, cost:1221) ConcurrentMarkSweep(count:0, cost:0)

2026-05-11 16:19:17 INFO
2026-05-11 16:19:16.920
SOURCE: [RPS:400.2, IOPS:0.739M, delay:4663920ms]
SINK: [RPS:400.16, TPS:6.4, IOPS:0.739M, delay:4109412ms]
SINK_TIME: [execute_time:1.52ms/record, commit_time:1.11ms/batch]
SINK_SLOW_ROUTES:
SINK_THREAD: 0/64
DISPATCHER: wait record:0, ready batch:0, shardTime:nullms/record
forward_slot0 batchAccumulate: 0, recordAccumulate: 0
queue_slot1 batchAccumulate: 0, recordAccumulate: 0
heap:399M/3891M, noHeap:76M/488M, threadCount:81, cpu:0.491, sysCpu:49.472
ParNew(count:52, cost:1221) ConcurrentMarkSweep(count:0, cost:0)

检查该java进程

7 个赞

你这样也截个图 看看

jstat -gc 3s

看着信息是下游写入能力断崖与 SINK_THREAD=0、低 SINK RPS 同时出现 从上游信息来看 应该是Source 侧处于阻塞导致的 看着也没有full gc

2 个赞

抽取延迟,可以调整store参数试试。
liboblog.working_mode=memory
ob2store.serialize_pool_size=16

2 个赞

2 个赞

msg/connector_source_msg.log msg/connector_sink_msg.log 这两个日志 也提供一下吧

2 个赞

connector_sink_msg.log (50.4 KB)
connector_source_msg.log (70.1 KB)

2 个赞

是否有error.log日志 这个日志connector.log libobcdc.log也提供一下

2 个赞

学习了

1 个赞

无error日志,connector.log libobcdc.log如下
libobcdc.log (248.3 KB)
connector.log (158.8 KB)

1 个赞

从cdc的日志来看应该是触发了cdc的内存使用上线了 建议扩容一下cdc的内存
9 [2026-05-11 17:30:03.963877] INFO [TLOG] global_flow_control_ (ob_log_instance.cpp:2727) [891900][][T0][Y0-0000000000000000-0-0] [lt=11] [STAT] [FLOW_CONTROL] NEED_SLOW_DOWN=1 PAUSED=0 MEM=6.80GB/8.00GB AVAIL_MEM=93.19GB/10.00GB READY_TO_SEQ=2/80000 PART_TRANS(TOTAL=80000, ACTIVE=34101/80000, REUSABLE=2326/80000) LOG_TASK(ACTIVE=2809) STORE(1/100) [FETCHER=11 DML_PARSER=0 DDL=0 COMMITER=2064 USER_QUEUE=251 OUT=11 RC=0] DIST_TRANS(SEQ_QUEUE=16618, SEQ=0, COMMITTED=263) NEED_PAUSE_DISPATCH=1 REASON=MEMORY_LIMIT_AND_DISPATCH_PAUSED

1 个赞

请问下,需要调整哪个参数

liboblog.memory_limit=16G 这个参数 建议调整一下 现在应该是8G 达到上限了 触发流控

liboblog.memory_usage_warn_threshold=85 这个是使用上线的比例 所以8*0.85 刚好是6.8G这样
https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000004476766
可以看看这个文档

OMS在源端出现大事务后store组件如何优化配置 - #14,来自 未来OB之神 这个帖子和这个问题是一样的

1 个赞

了解了解

签到打卡

2 个赞

感谢分享

这个问题让我想起了SINK相关的优化,特别是在16方面,采用record策略很有效。

学习一下

又学到了