OB导入大量表&数据后obinlog同步异常中断

社区版本

obbinlog 4.2.3

obproxy 4.3.3.0-5

observer 4.2.5.1

业务测试环境,下游业务依赖binlog,但obbinlog已无法正常产生

obbinlog 2983登录,convert_running一直是No状态

mysql> SHOW BINLOG INSTANCES\G
*************************** 1. row ***************************
name: u6ravjo6cv
ob_cluster: obtest42
ob_tenant: mysql2
ip: xxxx.131.151
port: 8101
zone:
region:
group:
running: Yes
state: Running
obcdc_running: Yes
obcdc_state: Running
service_mode: enabled
convert_running: No
convert_delay: 4053891
convert_rps: 0
convert_eps: 0
convert_iops: 0
dumpers: 0
version: 4.2.3-1cf30a786ba8c4984ddb24d5273a1d826b2ab11e

obbinlog进程凌晨1点过异常停止,上午10点过进行重启,12点21自动异常重启

登录obproxy,位点一直没变

 show master status\G
*************************** 1. row ***************************
             File: mysql-bin.000255
         Position: 242
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: a2079d5e-d221-11ef-a33c-6c92bf54871c:1-2819233
1 row in set (0.307 sec)

最老的clog是25号晚上23:21,但obbinlog无法拉取

如何确认 binlog 拉取时间有效性-OceanBase知识库

SELECT CEIL(MAX(BEGIN_SCN)/1000) AS START_TS_US FROM oceanbase.GV$OB_LOG_STAT;
+------------------+
| START_TS_US      |
+------------------+
| 1740496919681735 |
+------------------+
1 row in set (0.053 sec)

image-20250226135755023

binlog obcdc 异常停止时有error相关日志

egrep error libobcdc.log.20250226124542473
[2025-02-26 12:21:13.945413] INFO [TLOG] handle_error (ob_log_instance.cpp:2036) [1063950][][T0][Y3BF50A388397-0000000000000001-0-0] [lt=6] HANDLE_ERROR: err_cb=0x7f44373c46e0, errno=-4012, errmsg=“meta_data_service exit on error(ret = 4294963284)”
[2025-02-26 12:21:13.945416] INFO [TLOG] handle_error (ob_log_instance.cpp:2044) [1063950][][T0][Y3BF50A388397-0000000000000001-0-0] [lt=3] ERROR_CALLBACK begin(err_cb_=0x7f44373c46e0)
[2025-02-26 12:21:13.945443] INFO [TLOG] handle_error (ob_log_instance.cpp:2046) [1063950][][T0][Y3BF50A388397-0000000000000001-0-0] [lt=3] ERROR_CALLBACK end(err_cb_=0x7f44373c46e0)

obbinlog目录有异常停止core.文件生成

root root 8.9K Feb 26 12:21 binlog_instance.conf
root root    7 Feb 26 12:21 binlog_instance.pid
root root    0 Feb 26 12:21 binlog_instance.socket
root root 335K Feb 26 10:21 binlog_tenant_gtid_seq.meta
root root 1.6G Feb 26 12:21 core.1063925
root root 7.8G Feb 25 19:45 core.2160322
root root 4.0K Feb 26 12:21 data
root root   27 Feb 26 12:21 etc
root root  52K Feb 26 12:45 log
root root   26 Feb 17 18:03 run
root root  123 Feb 26 12:21 storage

昨晚上导入大量表和数据,导入表时报过错

Too many partitions (including subpartitions) were defined

该租户目前1.3w个表,都是单表,没创建分区

select count(table_name)  from information_schema.tables where TABLE_SCHEMA not in ('performance_schema','information_schema','sys','mysql','METRICS_SCHEMA','PERFORMANCE_SCHEMA','INFORMATION_SCHEMA','test')  limit 10;
+-------------------+
| count(table_name) |
+-------------------+
|             13754 |
+-------------------+
1 row in set (0.163 sec)

obinlog服务器信息
cat /etc/issue
\S
Kernel \r on an \m

uname -r
5.4.210-4.ve1.x86_64

cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)


obproxy服务器信息
uname -r
3.10.0-957.el7.x86_64

cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)

cat /etc/issue
\S
Kernel \r on an \m

obproxy详细信息
/home/admin/obproxy/bin/obproxy -V
obproxy (OceanBase 4.3.3.0 5)
REVISION: 1-local-60ff90081edf7829e4f6458adb6d1184c3344b80
BUILD_TIME: Jan 21 2025 17:32:47
BUILD_FLAGS: -g -O2 -D_OB_VERSION=1000 -D_NO_EXCEPTION -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -DNDEBUG -D__USE_LARGEFILE64 -D_FILE_OFFSET_BITS=64 -D_LARGE_FILE -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -Wformat -Wno-deprecated -Wno-invalid-offsetof -finline-functions -fno-strict-aliasing -mtune=core2 -Wno-psabi -Wno-address-of-packed-member -fno-omit-frame-pointer -Wl,-z,noexecstack,-z,relro,-z,now,-z,notext -fPIC -isystem /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/obproxy-tmp.140/BUILD/obproxy-ce-4.3.3.0/deps/3rd/usr/local/oceanbase/deps/devel/include -isystem /home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/obproxy-tmp.140/BUILD/obproxy-ce-4.3.3.0/deps/3rd/usr/include -L/home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/obproxy-tmp.140/BUILD/obproxy-ce-4.3.3.0/deps/3rd/usr/local/oceanbase/deps/devel/lib -D_GLIBCXX_USE_CXX11_ABI=0 -DBUILD_OPENSOURCE -DSUPPORT_SSE4_2 -DHAVE_SCHED_GETCPU -DHAVE_REALTIME_COARSE -DOB_HAVE_EVENTFD -DHAVE_FALLOCATE -fuse-ld=lld -Wall -Wextra -Wno-ignored-qualifiers -Wno-deprecated-copy -Wno-sign-compare -Wno-varargs -Wno-overloaded-virtual -Wno-sign-conversion -Wno-string-plus-int -Wno-shorten-64-to-32 -Wno-delete-non-abstract-non-virtual-dtor -Wno-overloaded-virtual -Wno-unused-command-line-argument -Wno-inconsistent-missing-override -Wno-mismatched-tags -Wno-dynamic-class-memaccess -Wno-format-security -Wno-reinterpret-base-class -Wl,–allow-multiple-definition --gcc-toolchain=/home/jenkins/agent/workspace/ob_artifacte_local_artifact/ob_source_code_dir/obproxy-tmp.140/BUILD/obproxy-ce-4.3.3.0/deps/3rd/usr/local/oceanbase/devtools

gdb分析的core文件信息如下
gdb /data/binlogservice/bin/binlog_instance core.1063925

ob_binlog_gdb-output.log (14.0 KB)

gdb ./bin/logproxy /tmp/core.2160322
gdb ./bin/logproxy /tmp/core.1063925
ob_binlog_gdb-output2.log (26.9 KB)
ob_binlog_gdb-output1.log (8.2 KB)

run/${instance_name}/log ,确认该目录中的文件是否包含 ERROREDIAG 信息。

libobcdc.log.20250226124542473.crash.tar.gz (11.4 MB)

EDIAG日志比较多
egrep -i ediag libobcdc.log.20250226124542473 | wc -l
57817

egrep -i error libobcdc.log.20250226124542473 | wc -l
5


看报错信息是 clog被回收了导致的

可能是被回收了,生成的最后正常binlog文件25号 19:45的,大量导入数据clog日志空间被重用了,obbinlog没拉到最老的吗

514M Feb 25 17:13 mysql-bin.000244
513M Feb 25 17:14 mysql-bin.000245
513M Feb 25 17:15 mysql-bin.000246
513M Feb 25 17:16 mysql-bin.000247
514M Feb 25 17:17 mysql-bin.000248
513M Feb 25 19:30 mysql-bin.000249
513M Feb 25 19:31 mysql-bin.000250
175M Feb 25 19:45 mysql-bin.000251
242 Feb 25 21:45 mysql-bin.000252
242 Feb 25 23:46 mysql-bin.000253
242 Feb 26 10:21 mysql-bin.000254
242 Feb 26 12:21 mysql-bin.000255
242 Feb 26 14:21 mysql-bin.000256
195 Feb 26 14:21 mysql-bin.000257
1.4K Feb 26 14:21 mysql-bin.index

是的 之前的clog被回收掉了 导致的

obbinlog重启时core文件分析上传了
@淇铭