版本
生产obproxy版本
4.2.3.0-3
生产obbinlog版本
4.0.1
生产observer版本
4.2.2.0
3个obbinlog下游订阅消费,无业务方使用,还在内部验证使用
curl http://10.xxxx:2984/metrics | grep binlog_instance_dump_count
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 15606 100 15606 # TYPE binlog_instance_dump_count gauge
0 binlog_instance_dump_count{host_name=“96c285571219690edfeeb3c1205319”,instance_id=“f3tgkavjbx”,ob_cluster_id=“1”,ob_cluster_name=“oceanbase41”,ob_tenant_id=“1004”,tenant_name=“mysql”,trace_id=""} 3
问题
最近1天两个obbinlog实例进程内存增长从30GB到35GB,监控数据来源obbinlog暴露的prometheus协议端口http://10.xxxx:2984/metrics
两个obbinlog实例进程内存已经增长到是observer内存2倍了!
最近几天服务器可用内存一直在减少,31号的突刺是操作系统自动把另外一个进程OOM了
操作
30号下午重启obbinlog server内存未释放
cd /home/ds/oblogproxy
./run.sh stop
./run.sh start
obproxy中查看obbinlog位点正常
mysql> show master status ;
±-----------------±---------±-------------±-----------------±------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
±-----------------±---------±-------------±-----------------±------------------------------------------------+
| mysql-bin.001039 | 60366382 | | | 09ba19f5-1a50-11ee-83dc-b4055d38dc24:1-58115929 |
±-----------------±---------±-------------±-----------------±------------------------------------------------+
obbinlog logs查看正常
mysql> show binary logs;
±-----------------±----------+
| Log_name | File_size |
±-----------------±----------+
| mysql-bin.000940 | 536872850 |
| mysql-bin.000941 | 536974573 |
| mysql-bin.000942 | 536885007 |
| mysql-bin.000943 | 537763850 |
| mysql-bin.000944 | 536873835 |
| mysql-bin.000945 | 536877624 |
| mysql-bin.000946 | 536878936 |
| mysql-bin.000947 | 536884007 |
| mysql-bin.000948 | 536891003 |
| mysql-bin.000949 | 536919860 |
| mysql-bin.000950 | 536944399 |
| mysql-bin.000951 | 536873692 |
| mysql-bin.000952 | 536871423 |
| mysql-bin.000953 | 537197432 |
| mysql-bin.000954 | 536875222 |
| mysql-bin.000955 | 536874367 |
| mysql-bin.000956 | 537023818 |
| mysql-bin.000957 | 536881310 |
| mysql-bin.000958 | 537050661 |
| mysql-bin.000959 | 536943389 |
| mysql-bin.000960 | 536979141 |
| mysql-bin.000961 | 537294282 |
| mysql-bin.000962 | 537861817 |
| mysql-bin.000963 | 536882843 …
obbinlog自身端口2983查看实例信息正常
mysql> SHOW BINLOG INSTANCES;
±-----------±------------±----------±--------------±-----±-----±-------±------±--------±--------±--------------±------------±-------------±----------------±--------------±------------±------------±-------------±--------±-----------------------------------------------±---------+
| name | ob_cluster | ob_tenant | ip | port | zone | region | group | running | state | obcdc_running | obcdc_state | service_mode | convert_running | convert_delay | convert_rps | convert_eps | convert_iops | dumpers | version | odp_addr |
±-----------±------------±----------±--------------±-----±-----±-------±------±--------±--------±--------------±------------±-------------±----------------±--------------±------------±------------±-------------±--------±-----------------------------------------------±---------+
| c4lnlfhls9 | oceanbase41 | mysql | 10.xxxx | 8101 | | | | Yes | Running | Yes | Running | enabled | Yes | 65 | 10 | 15 | 11762 | 0 | 4.0.1-92748c6c59534426d187977b13b61b7cdf4fa200 | NULL |
| f3tgkavjbx | oceanbase41 | mysql | 10.xxxx | 8100 | | | | Yes | Running | Yes | Running | enabled | Yes | 196 | 10 | 15 | 11762 | 3 | 4.0.1-92748c6c59534426d187977b13b61b7cdf4fa200 | NULL |
±-----------±------------±----------±--------------±-----±-----±-------±------±--------±--------±--------------±------------±-------------±----------------±--------------±------------±------------±-------------±--------±-----------------------------------------------±---------+
日志
同一台服务器两个obbinlog实例部署
/home/ds/oblogproxy 03:12 PM Tue Dec 31
!1449 # ll run
total 4
drwxr-xr-x 7 root root 186 Dec 31 15:10 c4lnlfhls9
drwxr-xr-x 7 root root 186 Dec 31 15:10 f3tgkavjbx
-rw-r--r-- 1 root root 30 Dec 24 15:32 node.local
obbinlog日志中有error相关日志,看起来貌似和内存无关
!1448 # ll -h /home/ds/oblogproxy/log/logproxy.log
-rw-r–r-- 1 root root 102M Dec 31 15:12 /home/ds/oblogproxy/log/logproxy.log
# grep -v 'info' /home/ds/oblogproxy/log/logproxy.log | wc -l 1466
error日志
[2024-12-31 15:10:04.526335] [error] fs_util.cpp(472): Failed to calculate file size No such file or directory
[2024-12-31 15:10:40.964266] [error] database_protocol.cpp(1923): Failed to update instance: {"cluster":"oceanbase41","cluster_id":"1","delay":1433,"heartbeat":1735629040,"instance_name":"f3tgkavjbx"
....
[2024-12-31 14:36:26.205758] [error] database_protocol.cpp(1923): Failed to update instance: {"cluster":"oceanbase41","cluster_id":"1","delay":1619,"heartbeat":1735626986,"instance_name":"c4lnlfhls9","ip":"1
0.27.143.209","min_dump_checkpoint":0,"node_id":"96c285571219690edfeeb3c1205319","pid":44301,"port":8101,"state":2,"tenant":"mysql","tenant_id":"1004","work_path":"./run/c4lnlfhls9","config":{"cluster":"ocea
...skipping..
info日志看起无异常
[2024-12-31 15:20:45.786114] [info] instance_client.cpp(69): Successfully login binlog instance, is_local: true, addr: ./run/c4lnlfhls9/binlog_instance.socket
[2024-12-31 15:20:45.786229] [info] mysql_protocol.cpp(70): Connect to server through unix domain socket success: ./run/f3tgkavjbx/binlog_instance.socket, user: OBM
[2024-12-31 15:20:45.787168] [info] mysql_protocol.cpp(129): Auth user success of server: ./run/f3tgkavjbx/binlog_instance.socket, user: OBM
[2024-12-31 15:20:45.787190] [info] instance_client.cpp(69): Successfully login binlog instance, is_local: true, addr: ./run/f3tgkavjbx/binlog_instance.socket
[2024-12-31 15:20:46.796036] [info] mysql_protocol.cpp(70): Connect to server through unix domain socket success: ./run/c4lnlfhls9/binlog_instance.socket, user: OBM
[2024-12-31 15:20:46.796696] [info] mysql_protocol.cpp(129): Auth user success of server: ./run/c4lnlfhls9/binlog_instance.socket, user: OBM
[2024-12-31 15:20:46.796713] [info] instance_client.cpp(69): Successfully login binlog instance, is_local: true, addr: ./run/c4lnlfhls9/binlog_instance.socket
[2024-12-31 15:20:46.797880] [info] mysql_protocol.cpp(70): Connect to server through unix domain socket success: ./run/f3tgkavjbx/binlog_instance.socket, user: OBM
[2024-12-31 15:20:46.798565] [info] mysql_protocol.cpp(129): Auth user success of server: ./run/f3tgkavjbx/binlog_instance.socket, user: OBM
[2024-12-31 15:20:46.798592] [info] instance_client.cpp(69): Successfully login binlog instance, is_local: true, addr: ./run/f3tgkavjbx/binlog_instance.socket
[2024-12-31 15:20:47.807426] [info] mysql_protocol.cpp(70): Connect to server through unix domain socket success: ./run/c4lnlfhls9/binlog_instance.socket, user: OBM
[2024-12-31 15:20:47.809551] [info] mysql_protocol.cpp(70): Connect to server through unix domain socket success: ./run/f3tgkavjbx/binlog_instance.socket, user: OBM
[2024-12-31 15:20:47.810242] [info] mysql_protocol.cpp(129): Auth user success of server: ./run/f3tgkavjbx/binlog_instance.socket, user: OBM
binlog_instance无error相关日志
!1467 # grep error ../run/c4lnlfhls9/log/binlog_instance.log
#无非info级别相关日志
root@xxxxxx:/home/ds/oblogproxy/log 03:56 PM Tue Dec 31
!1468 # grep -V 'info' ../run/c4lnlfhls9/log/binlog_instance.log | wc -l
7
root@xxxxxx:/home/ds/oblogproxy/log 03:56 PM Tue Dec 31
!1469 # grep -V 'info' ../run/c4lnlfhls9/log/binlog_instance.log
grep (GNU grep) 2.20
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
root@xxxxxx:/home/ds/oblogproxy/log 03:57 PM Tue Dec 31
!1470 # wc -l ../run/c4lnlfhls9/log/binlog_instance.log
1717437 ../run/c4lnlfhls9/log/binlog_instance.log
#info级别日志无异常
tail -fn10 ../run/c4lnlfhls9/log/binlog_instance.log
[2024-12-31 16:01:31.924327] [info] event_dispatch.cpp(117): Succeeded do_cmd on connection [24243dabc41ccc7db431ea865b48addc][OBI][OBM,,]10.27.143.209:0-10.27.143.209:0/98
[2024-12-31 16:01:31.924500] [info] connection.cpp(139): Client [[OBI][OBM,,]10.27.143.209:0-10.27.143.209:0/98] has been closed.
[2024-12-31 16:01:31.924534] [info] connection.cpp(89): Closed connection [OBI][OBM,,]10.27.143.209:0-10.27.143.209:0/98
[2024-12-31 16:01:31.984655] [info] clog_reader_routine.cpp(68): Fetch libobcdc timeout, nothing coming...
[2024-12-31 16:01:31.984667] [info] binlog_convert.cpp(237): Empty log Record queue put by clog reader routine , retry...
[2024-12-31 16:01:31.984816] [info] binlog_storage.cpp(83): Empty log event queue put by convert thread, retry...
[2024-12-31 16:01:32.171656] [info] clog_reader_routine.cpp(68): Fetch libobcdc timeout, nothing coming...
[2024-12-31 16:01:32.171882] [info] binlog_convert.cpp(237): Empty log Record queue put by clog reader routine , retry...
[2024-12-31 16:01:32.173294] [info] binlog_storage.cpp(83): Empty log event queue put by convert thread, retry...
[2024-12-31 16:01:32.271765] [info] clog_reader_routine.cpp(68): Fetch libobcdc timeout, nothing coming...
[2024-12-31 16:01:32.356453] [info] counter.cpp(69): Counter:[Span:2000ms][Delay:192087,977259][RCNT:107][RRPS:53][RIOS:9897][WCNT:182][WRPS:91][WIOS:35297,AVG:387][XWIOS:0,AVG:0][RFETCH:209281][ROFFER:101][SPOLL:0][SENCODE:2216968][SSEND:2216692][BINLOG_DELAY_US:0][NDUMPER:0][NEventQ:0][RecordQueueSize:0]
[2024-12-31 16:01:32.473156] [info] clog_reader_routine.cpp(68): Fetch libobcdc timeout, nothing coming...
[2024-12-31 16:01:32.477067] [info] binlog_convert.cpp(237): Empty log Record queue put by clog reader routine , retry...
[2024-12-31 16:01:32.477977] [info] binlog_storage.cpp(83): Empty log event queue put by convert thread, retry...
[2024-12-31 16:01:32.874701] [info] clog_reader_routine.cpp(68): Fetch libobcdc timeout, nothing coming...
[2024-12-31 16:01:32.878433] [info] binlog_convert.cpp(237): Empty log Record queue put by clog reader routine , retry...
[2024-12-31 16:01:32.879698] [info] binlog_storage.cpp(83): Empty log event queue put by convert thread, retry...
[2024-12-31 16:01:32.931283] [info] event_dispatch.cpp(52): On connect from [OBI][,,]10.27.143.209:0-10.27.143.209:0/98
binlog_instance中libobcdc.log日志
/home/ds/oblogproxy/run/c4lnlfhls9/log
#包含NEED_SLOW_DOWN日志有较多
!1430 # grep NEED_SLOW libobcdc.log | wc -l
288
root@xxxxxx:/home/ds/oblogproxy/run/c4lnlfhls9/log 05:32 PM Tue Dec 31
!1431 # wc -l libobcdc.log
127739 libobcdc.log
root@xxxxxx:/home/ds/oblogproxy/run/c4lnlfhls9/log 05:32 PM Tue Dec 31
!1432 # ll -h libobcdc.log
-rw-r--r-- 1 root root 32M Dec 31 17:32 libobcdc.log
#包含NEED_SLOW_DOWN关键字的日志
[2024-12-31 17:31:12.071322] INFO [TLOG] global_flow_control_ (ob_log_instance.cpp:2615) [48671][][T0][Y0-0000000000000000-0-0] [lt=21] [STAT] [FLOW_CONTROL] NEED_SLOW_DOWN=0 PAUSED=0 MEM=1.79GB/4.00GB AVAIL_MEM=199.31GB/20.11GB READY_TO_SEQ=0/40000 PART_TRANS(TOTAL=40000, ACTIVE=161/40000, REUSABLE=0/40000) LOG_TASK(ACTIVE=0) STORE(0/100) [FETCHER=2 DML_PARSER=0 COMMITER=0 USER_QUEUE=0 OUT=0 RC=0] DIST_TRANS(SEQ_QUEUE=0, SEQ=0, COMMITTED=0) NEED_PAUSE_DISPATCH=0 REASON=
[2024-12-31 17:31:17.075411] INFO [TLOG] global_flow_control_ (ob_log_instance.cpp:2615) [48671][][T0][Y0-0000000000000000-0-0] [lt=8] [STAT] [FLOW_CONTROL] NEED_SLOW_DOWN=0 PAUSED=0 MEM=1.80GB/4.00GB AVAIL_MEM=199.30GB/20.11GB READY_TO_SEQ=0/40000 PART_TRANS(TOTAL=40000, ACTIVE=82/40000, REUSABLE=28/40000) LOG_TASK(ACTIVE=28) STORE(0/100) [FETCHER=0 DML_PARSER=0 COMMITER=0 USER_QUEUE=0 OUT=28 RC=0] DIST_TRANS(SEQ_QUEUE=0, SEQ=0, COMMITTED=28) NEED_PAUSE_DISPATCH=0 REASON=
[2024-12-31 17:31:22.079678] INFO [TLOG] global_flow_control_ (ob_log_instance.cpp:2615) [48671][][T0][Y0-0000000000000000-0-0] [lt=9] [STAT] [FLOW_CONTROL] NEED_SLOW_DOWN=0 PAUSED=0 MEM=1.78GB/4.00GB AVAIL_MEM=199.32GB/20.11GB READY_TO_SEQ=0/40000 PART_TRANS(TOTAL=40000, ACTIVE=163/40000, REUSABLE=0/40000) LOG_TASK(ACTIVE=0) STORE(0/100) [FETCHER=0 DML_PARSER=0 COMMITER=0 USER_QUEUE=0 OUT=0 RC=0] DIST_TRANS(SEQ_QUEUE=50, SEQ=0, COMMITTED=0) NEED_PAUSE_DISPATCH=0 REASON=
[2024-12-31 17:31:27.084023] INFO [TLOG] global_flow_control_ (ob_log_instance.cpp:2615) [48671][][T0][Y0-0000000000000000-0-0] [lt=6] [STAT] [FLOW_CONTROL] NEED_SLOW_DOWN=0 PAUSED=0 MEM=1.78GB/4.00GB AVAIL_MEM=199.32GB/20.11GB READY_TO_SEQ=0/40000 PART_TRANS(TOTAL=40000, ACTIVE=19/40000, REUSABLE=0/40000) LOG_TASK(ACTIVE=0) STORE(0/100) [FETCHER=1 DML_PARSER=0 COMMITER=0 USER_QUEUE=0 OUT=0 RC=0] DIST_TRANS(SEQ_QUEUE=0, SEQ=0, COMMITTED=0) NEED_PAUSE_DISPATCH=0 REASON=
[2024-12-31 17:31:32.088269] INFO [TLOG] global_flow_control_ (ob_log_instance.cpp:2615) [48671][][T0][Y0-0000000000000000-0-0] [lt=9] [STAT] [FLOW_CONTROL] NEED_SLOW_DOWN=0 PAUSED=0 MEM=1.78GB/4.00GB AVAIL_MEM=199.32GB/20.11GB READY_TO_SEQ=0/40000 PART_TRANS(TOTAL=40000, ACTIVE=50/40000, REUSABLE=0/40000) LOG_TASK(ACTIVE=0) STORE(0/100) [FETCHER=2 DML_PARSER=0 COMMITER=0 USER_QUEUE=0 OUT=0 RC=0] DIST_TRANS(SEQ_QUEUE=0, SEQ=0, COMMITTED=0) NEED_PAUSE_DISPATCH=0 REASON=
[2024-12-31 17:31:37.092552] INFO [TLOG] global_flow_control_ (ob_log_instance.cpp:2615) [48671][][T0][Y0-0000000000000000-0-0] [lt=9] [STAT] [FLOW_CONTROL] NEED_SLOW_DOWN=0 PAUSED=0 MEM=1.78GB/4.00GB AVAIL_MEM=199.32GB/20.11GB READY_TO_SEQ=0/40000 PART_TRANS(TOTAL=40000, ACTIVE=54/40000, REUSABLE=0/40000) LOG_TASK(ACTIVE=0) STORE(0/100) [FETCHER=2 DML_PARSER=0 COMMITER=0 USER_QUEUE=0 OUT=0 RC=0] DIST_TRANS(SEQ_QUEUE=0, SEQ=0, COMMITTED=0) NEED_PAUSE_DISPATCH=0 REASON=
[2024-12-31 17:31:42.097353] INFO [TLOG] global_flow_control_ (ob_log_instance.cpp:2615) [48671][][T0][Y0-0000000000000000-0-0] [lt=8] [STAT] [FLOW_CONTROL] NEED_SLOW_DOWN=0 PAUSED=0 MEM=1.78GB/4.00GB AVAIL_MEM=199.32GB/20.11GB READY_TO_SEQ=0/40000 PART_TRANS(TOTAL=40000, ACTIVE=120/40000, REUSABLE=0/40000) LOG_TASK(ACTIVE=0) STORE(0/100) [FETCHER=2 DML_PARSER=0 COMMITER=0 USER_QUEUE=0 OUT=0 RC=0] DIST_TRANS(SEQ_QUEUE=0, SEQ=0, COMMITTED=0) NEED_PAUSE_DISPATCH=0 REASON=
[2024-12-31 17:31:47.101590] INFO [TLOG] global_flow_control_ (ob_log_instance.cpp:2615) [48671][][T0][Y0-0000000000000000-0-0] [lt=8] [STAT] [FLOW_CONTROL] NEED_SLOW_DOWN=0 PAUSED=0 MEM=1.80GB/4.00GB AVAIL_MEM=199.31GB/20.11GB READY_TO_SEQ=0/40000 PART_TRANS(TOTAL=40000, ACTIVE=114/40000, REUSABLE=38/40000) LOG_TASK(ACTIVE=12) STORE(0/100) [FETCHER=2 DML_PARSER=0 COMMITER=38 USER_QUEUE=0 OUT=0 RC=0] DIST_TRANS(SEQ_QUEUE=0, SEQ=0, COMMITTED=1) NEED_PAUSE_DISPATCH=0 REASON=
[2024-12-31 17:31:52.106019] INFO [TLOG] global_flow_control_ (ob_log_instance.cpp:2615) [48671][][T0][Y0-0000000000000000-0-0] [lt=8] [STAT] [FLOW_CONTROL] NEED_SLOW_DOWN=0 PAUSED=0 MEM=1.78GB/4.00GB AVAIL_MEM=199.32GB/20.11GB READY_TO_SEQ=0/40000 PART_TRANS(TOTAL=40000, ACTIVE=68/40000, REUSABLE=0/40000) LOG_TASK(ACTIVE=0) STORE(0/100) [FETCHER=2 DML_PARSER=0 COMMITER=0 USER_QUEUE=0 OUT=0 RC=0] DIST_TRANS(SEQ_QUEUE=0, SEQ=0, COMMITTED=0) NEED_PAUSE_DISPATCH=0 REASON=
......