【 使用环境 】生产环境
【 OB or 其他组件 】OB
【 使用版本 】3.1.2
【问题描述】
三节点环境,中午11:40一台observer节点内存100%,导致os将observer kill,重启之后进程存在但节点为不正常状态。之后13:50左右另外一台节点也触发了oom killer,observer进程被杀。
之后整个observer集群便处于无法读写状态。ocp管理里面一直提示运维中无法操作,于是从操作系统上将observer进程进程重启,kill然后重启 /home/admin/oceanbase/bin/observer
,此后2节点显示:
ERROR 8001 (08004): Server is initializing
另外一个节点提示:ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 0
observer.log一直刷写如下日志
[2024-11-28 15:32:56.502795] WARN [STORAGE.TRANS] do_cluster_heartbeat_ (ob_tenant_weak_read_service.cpp:591) [34502][3291][Y0-0000000000000000] [lt=4] [dc=0] tenant weak read service do cluster heartbeat fail(ret=-5019, ret="OB_TABLE_NOT_EXIST", tenant_id_=1009, last_post_cluster_heartbeat_tstamp_=1732779176442541, cluster_heartbeat_interval_=1000000, cluster_service_pkey={tid:1109407232426210, partition_id:0, part_cnt:0}, cluster_service_master="0.0.0.0")
[2024-11-28 15:32:56.521005] WARN [SERVER] get_master_root_server (ob_service.cpp:3592) [33462][1468][YB42C0A8AA0D-000627F3C322BBE9] [lt=4] [dc=0] not master rootserver(ret=-4638, master_rs="192.168.170.8:2882")
[2024-11-28 15:32:56.521013] WARN [SERVER] process (ob_rpc_processor_simple.cpp:1957) [33462][1468][YB42C0A8AA0D-000627F3C322BBE9] [lt=8] [dc=0] failed to get master root server(ret=-4638)
[2024-11-28 15:32:56.521302] WARN [SERVER] fill_partition_replica (ob_service.cpp:638) [33462][1468][YB42C0A8AA0D-000627F3C322BBE9] [lt=2] [dc=0] invalid partition(ret=-4251, part_key={tid:1099511627777, partition_id:0, part_cnt:1})
[2024-11-28 15:32:56.521310] WARN [SERVER] fill_partition_replica (ob_service.cpp:609) [33462][1468][YB42C0A8AA0D-000627F3C322BBE9] [lt=7] [dc=0] failed to fill_partition_replica(ret=-4251, pg_key={tid:1099511627777, partition_id:0, part_cnt:1})
[2024-11-28 15:32:56.521313] WARN [SERVER] get_root_server_status (ob_service.cpp:3544) [33462][1468][YB42C0A8AA0D-000627F3C322BBE9] [lt=2] [dc=0] fail to fill partition replica(ret=-4251, partition_key={tid:1099511627777, partition_id:0, part_cnt:1})
[2024-11-28 15:32:56.521663] WARN [SERVER] fill_partition_replica (ob_service.cpp:638) [33462][1468][YB42C0A8AA0D-000627F3C322BBE9] [lt=3] [dc=0] invalid partition(ret=-4251, part_key={tid:1099511627777, partition_id:0, part_cnt:1})
[2024-11-28 15:32:56.521669] WARN [SERVER] fill_partition_replica (ob_service.cpp:609) [33462][1468][YB42C0A8AA0D-000627F3C322BBE9] [lt=5] [dc=0] failed to fill_partition_replica(ret=-4251, pg_key={tid:1099511627777, partition_id:0, part_cnt:1})
[2024-11-28 15:32:56.521672] WARN [SERVER] get_root_server_status (ob_service.cpp:3544) [33462][1468][YB42C0A8AA0D-000627F3C322BBE9] [lt=2] [dc=0] fail to fill partition replica(ret=-4251, partition_key={tid:1099511627777, partition_id:0, part_cnt:1})
[2024-11-28 15:32:56.522272] WARN [SERVER] get_master_root_server (ob_service.cpp:3592) [33462][1468][YB42C0A8AA0D-000627F3C322BBEB] [lt=3] [dc=0] not master rootserver(ret=-4638, master_rs="192.168.170.8:2882")
[2024-11-28 15:32:56.522278] WARN [SERVER] process (ob_rpc_processor_simple.cpp:1957) [33462][1468][YB42C0A8AA0D-000627F3C322BBEB] [lt=6] [dc=0] failed to get master root server(ret=-4638)
[2024-11-28 15:32:56.522594] WARN [SERVER] fill_partition_replica (ob_service.cpp:638) [33462][1468][YB42C0A8AA0D-000627F3C322BBEB] [lt=2] [dc=0] invalid partition(ret=-4251, part_key={tid:1099511627777, partition_id:0, part_cnt:1})
[2024-11-28 15:32:56.522602] WARN [SERVER] fill_partition_replica (ob_service.cpp:609) [33462][1468][YB42C0A8AA0D-000627F3C322BBEB] [lt=7] [dc=0] failed to fill_partition_replica(ret=-4251, pg_key={tid:1099511627777, partition_id:0, part_cnt:1})
[2024-11-28 15:32:56.522605] WARN [SERVER] get_root_server_status (ob_service.cpp:3544) [33462][1468][YB42C0A8AA0D-000627F3C322BBEB] [lt=2] [dc=0] fail to fill partition replica(ret=-4251, partition_key={tid:1099511627777, partition_id:0, part_cnt:1})
[2024-11-28 15:32:56.523016] WARN [SERVER] fill_partition_replica (ob_service.cpp:638) [33462][1468][YB42C0A8AA0D-000627F3C322BBEB] [lt=3] [dc=0] invalid partition(ret=-4251, part_key={tid:1099511627777, partition_id:0, part_cnt:1})
[2024-11-28 15:32:56.523022] WARN [SERVER] fill_partition_replica (ob_service.cpp:609) [33462][1468][YB42C0A8AA0D-000627F3C322BBEB] [lt=5] [dc=0] failed to fill_partition_replica(ret=-4251, pg_key={tid:1099511627777, partition_id:0, part_cnt:1})
[2024-11-28 15:32:56.523025] WARN [SERVER] get_root_server_status (ob_service.cpp:3544) [33462][1468][YB42C0A8AA0D-000627F3C322BBEB] [lt=2] [dc=0] fail to fill partition replica(ret=-4251, partition_key={tid:1099511627777, partition_id:0, part_cnt:1})
rootserver.log.wf
刷写如下日志
[2024-11-28 15:33:37.569986] WARN [RS] follower_process (ob_rs_rpc_processor.h:253) [33559][1654][YB42C0A8AA0A-000627F3A0FD4B4A] [lt=5] [dc=0] not master rootserver
[2024-11-28 15:33:37.569995] WARN [RS] process_ (ob_rs_rpc_processor.h:233) [33559][1654][YB42C0A8AA0A-000627F3A0FD4B4A] [lt=8] [dc=0] follower process failed(ret=-4638, pcode=1030)
[2024-11-28 15:33:37.591584] WARN [RS] follower_process (ob_rs_rpc_processor.h:253) [33559][1654][YB42C0A8AA08-000627F39A0B0632] [lt=2] [dc=0] not master rootserver
[2024-11-28 15:33:37.591592] WARN [RS] process_ (ob_rs_rpc_processor.h:233) [33559][1654][YB42C0A8AA08-000627F39A0B0632] [lt=8] [dc=0] follower process failed(ret=-4638, pcode=1030)
[2024-11-28 15:33:37.592323] WARN [RS] follower_process (ob_rs_rpc_processor.h:253) [33559][1654][YB42C0A8AA08-000627F39A0B0634] [lt=3] [dc=0] not master rootserver
[2024-11-28 15:33:37.592333] WARN [RS] process_ (ob_rs_rpc_processor.h:233) [33559][1654][YB42C0A8AA08-000627F39A0B0634] [lt=9] [dc=0] follower process failed(ret=-4638, pcode=1030)
[2024-11-28 15:33:37.638208] WARN [RS] follower_process (ob_rs_rpc_processor.h:253) [33559][1654][YB42C0A8AA0D-000627F3C322C14F] [lt=3] [dc=0] not master rootserver
[2024-11-28 15:33:37.638215] WARN [RS] process_ (ob_rs_rpc_processor.h:233) [33559][1654][YB42C0A8AA0D-000627F3C322C14F] [lt=6] [dc=0] follower process failed(ret=-4638, pcode=1030)
[2024-11-28 15:33:37.639541] WARN [RS] follower_process (ob_rs_rpc_processor.h:253) [33559][1654][YB42C0A8AA0D-000627F3C322C151] [lt=4] [dc=0] not master rootserver
[2024-11-28 15:33:37.639549] WARN [RS] process_ (ob_rs_rpc_processor.h:233) [33559][1654][YB42C0A8AA0D-000627F3C322C151] [lt=7] [dc=0] follower process failed(ret=-4638, pcode=1030)
[2024-11-28 15:33:37.669241] WARN [RS] follower_process (ob_rs_rpc_processor.h:253) [33559][1654][YB42C0A8AA0A-000627F3A0FD4B4C] [lt=4] [dc=0] not master rootserver
[2024-11-28 15:33:37.669250] WARN [RS] process_ (ob_rs_rpc_processor.h:233) [33559][1654][YB42C0A8AA0A-000627F3A0FD4B4C] [lt=8] [dc=0] follower process failed(ret=-4638, pcode=1030)
[2024-11-28 15:33:37.670190] WARN [RS] follower_process (ob_rs_rpc_processor.h:253) [33559][1654][YB42C0A8AA0A-000627F3A0FD4B4E] [lt=2] [dc=0] not master rootserver
[2024-11-28 15:33:37.670199] WARN [RS] process_ (ob_rs_rpc_processor.h:233) [33559][1654][YB42C0A8AA0A-000627F3A0FD4B4E] [lt=8] [dc=0] follower process failed(ret=-4638, pcode=1030)
[2024-11-28 15:33:37.691966] WARN [RS] follower_process (ob_rs_rpc_processor.h:253) [33559][1654][YB42C0A8AA08-000627F39A0B0636] [lt=2] [dc=0] not master rootserver
[2024-11-28 15:33:37.691977] WARN [RS] process_ (ob_rs_rpc_processor.h:233) [33559][1654][YB42C0A8AA08-000627F39A0B0636] [lt=9] [dc=0] follower process failed(ret=-4638, pcode=1030)
[2024-11-28 15:33:37.692786] WARN [RS] follower_process (ob_rs_rpc_processor.h:253) [33559][1654][YB42C0A8AA08-000627F39A0B0638] [lt=3] [dc=0] not master rootserver