oblogproxy同步问题

【产品名称】

oblogproxy-1.0.0-1.el7.x86_64

【产品版本】

【问题描述】

生产环境obcluster:A/B/C 机器,在D机器配置oblogproxy+canal_for_ob,始终无法ob–>mysql的增量,

咨询1,liboblog/canal_deploy/canal_adapter日志如下,进一步应该怎么看?

咨询2、官网文档都是 observer+canal在同一台机器作为测试用途有些不一样,不知是否有生产配置 oblogproxy的案例?

谢谢

信息如下

1、run/…/liboblog.log

[2022-03-10 11:55:03.626372] INFO [COMMON] memory_dump.cpp:401 [15417][5][Y0-0000000000000000] [lt=21] handle dump task(task={type_:2, dump_all_:false, p_context_:null, slot_idx_:0, dump_tenant_ctx_:false, tenant_id_:0, ctx_id_:0, p_chunk_:null})

[2022-03-10 11:55:03.630917] INFO memory_dump.cpp:518 [15417][5][Y0-0000000000000000] [lt=37] statistics:

tenant_cnt: 2, max_chunk_cnt: 524288

tenant_id ctx_id chunk_cnt label_cnt segv_cnt

1 0 1 1 0

500 0 50 84 0

cost_time: 4494

[2022-03-10 11:55:09.002152] INFO [COMMON] ob_kvcache_store.cpp:799 [15415][1][Y0-0000000000000000] [lt=5] Wash compute wash size(sys_total_wash_size=-3873908326, global_cache_size=0, tenant_max_wash_size=0, tenant_min_wash_size=0, tenant_ids_=[1, 500])

[2022-03-10 11:55:09.002188] INFO [COMMON] ob_kvcache_store.cpp:323 [15415][1][Y0-0000000000000000] [lt=29] Wash time detail, (refresh_score_time=32, compute_wash_size_time=140, wash_sort_time=2, wash_time=1)

2、canal.log最新几行

2022-03-10 12:04:19.924 [New I/O server worker #1-1] INFO c.a.otter.canal.server.netty.handler.SessionHandler - message receives in session handler…

2022-03-10 12:04:20.424 [New I/O server worker #1-1] INFO c.a.otter.canal.server.netty.handler.SessionHandler - message receives in session handler…

2022-03-10 12:04:20.424 [New I/O server worker #1-1] INFO c.a.otter.canal.server.netty.handler.SessionHandler - message receives in session handler…

3、example log

2022-03-10 10:36:06.859 [Thread-5] WARN com.oceanbase.clogproxy.client.connection.ClientStream - start to reconnect…

2022-03-10 10:36:07.027 [Thread-5] WARN com.oceanbase.clogproxy.client.connection.ClientStream - reconnect SUCC

2022-03-10 10:44:23.754 [New I/O server worker #1-1] INFO c.a.otter.canal.server.embedded.CanalServerWithEmbedded - rollback successfully, clientId:1001

2022-03-10 10:44:23.760 [New I/O server worker #1-1] INFO c.a.otter.canal.server.embedded.CanalServerWithEmbedded - subscribe successfully, ClientIdentity[destination=example,clientId=1001,filter=] with first position:null

4、tailf adapter.log一直看没有消息

2022-03-10 10:44:23.645 [main] INFO c.a.otter.canal.adapter.launcher.CanalAdapterApplication - Started CanalAdapterApplication in 4.722 seconds (JVM running for 6.305)

2022-03-10 10:44:23.785 [Thread-4] INFO c.a.otter.canal.adapter.launcher.loader.AdapterProcessor - =============> Subscribe destination: example succeed <=============

请把完整的oblogproxy和canal deploy和adaptor的配置文件和日志附件上传一下看看。

问题1:

看起来canal deployer和oblogproxy之间、canal adapter和canal deployer之间的连接都没有问题,可以先确认下oblogproxy在liboblog.log所在目录有没有liboblog或者oblogreader的error/warn日志,以及项目根目录下的log目录是否有error/warn日志。


如果oblogproxy没有报错的日志,可以尝试用java客户端测试一下是否有心跳数据。如果有心跳,那么有可能是配置问题,需要确认tableWhiteList写的是否正确,如果没有心跳信息,那么就是logproxy没能连上observer,可能是oblogproxy部署问题,或者触发了代码的bug,可以在github上提个issue,会有对应的开发帮忙解决。


https://github.com/oceanbase/oblogclient/blob/master/logproxy-client/src/test/java/com/oceanbase/clogproxy/client/LogProxyClientTest.java

https://github.com/oceanbase/oblogproxy/issues


问题2:

生产环境部署时,ob集群和oblogproxy不建议部署在同一台机器上,分开部署可以减小对ob集群服务造成的影响。对于canal的部署可以视数据规模和硬件资源来选择,参考 https://github.com/oceanbase/canal/wiki/%E7%94%9F%E4%BA%A7%E7%8E%AF%E5%A2%83%E5%AE%9E%E8%B7%B5

1 个赞

谢谢反馈,问题已经解决。

原因是:启动canal-adapter后,3个小时后adapter.log开始吐信息,接收到dml日志,开始正常同步了。理解应该是obcluster集群有100GB+的clog,

正在扫clog比较慢引起,

疑问:(虽然也设置了 start_timestamp,也等待了3个小时.)


补充:读取日志慢的原因,还跟机器硬件资源配置比较低有关.