集群不可用,日志报错“CRASH ERROR”

【 使用环境 】生产环境
【 OB or 其他组件 】OB
【 使用版本 】4.2.1.3-103000032023122818
【问题描述】集群不可用,日志有报错“CRASH ERROR!!!”
【复现路径】手动启动OBserver,使用gdb 解析core dump文件
【附件及日志】
gdb-output.zip (21.0 KB)

1 个赞

报错日志也发一下吧

麻烦您提供一下以下信息:
1,root@sys租户登录集群,执行SQL提供下结果:
1),select svr_ip,zone,with_rootserver,status,block_migrate_in_time,start_service_time,stop_time,build_version from oceanbase.__all_server order by zone;
2),select tenant_id,tenant_name,primary_zone,compatibility_mode from oceanbase.__all_tenant;
3),show parameters like ‘%syslog_level%’;
4),show parameters like ‘%syslog_io_bandwidth_limit%’;
5),select count(*),tenant_id,zone_list,unit_count from oceanbase.__all_resource_pool group by tenant_id,zone_list,unit_count;
2,登录OB任意一台主机,执行lsblk提供下结果
3,登录OB任意一台主机,执行lscpu | grep Architecture 提供下结果
4.提供CORE的节点,日志包含"CRASH ERROR"关键字的observer.log日志(请提供原始文件)

MySQL [OCEANBASE]> select svr_ip,zone,with_rootserver,status,block_migrate_in_time,start_service_time,stop_time,build_version from oceanbase.__all_server order by zone;
±--------------±------±----------------±-------±----------------------±-------------------±----------±------------------------------------------------------------------------------------------+
| svr_ip | zone | with_rootserver | status | block_migrate_in_time | start_service_time | stop_time | build_version |
±--------------±------±----------------±-------±----------------------±-------------------±----------±------------------------------------------------------------------------------------------+
| 192.169.4.161 | zone1 | 1 | ACTIVE | 0 | 1713842467720453 | 0 | 4.2.1.3_103000032023122818-8fe69c2056b07154bbd1ebd2c26e818ee0d5c56f(Jan 15 2024 07:17:12) |
| 192.169.4.162 | zone2 | 0 | ACTIVE | 0 | 1713842501001404 | 0 | 4.2.1.3_103000032023122818-8fe69c2056b07154bbd1ebd2c26e818ee0d5c56f(Jan 15 2024 07:17:12) |
| 192.169.4.163 | zone3 | 0 | ACTIVE | 0 | 1713842465554546 | 0 | 4.2.1.3_103000032023122818-8fe69c2056b07154bbd1ebd2c26e818ee0d5c56f(Jan 15 2024 07:17:12) |
±--------------±------±----------------±-------±----------------------±-------------------±----------±------------------------------------------------------------------------------------------+
3 rows in set (0.006 sec)

MySQL [OCEANBASE]> select tenant_id,tenant_name,primary_zone,compatibility_mode from oceanbase.__all_tenant;
±----------±------------±------------------±-------------------+
| tenant_id | tenant_name | primary_zone | compatibility_mode |
±----------±------------±------------------±-------------------+
| 1 | sys | zone1;zone2;zone3 | 0 |
| 1001 | META$1002 | zone1;zone2,zone3 | 0 |
| 1002 | obtest | zone1;zone2,zone3 | 0 |
±----------±------------±------------------±-------------------+
3 rows in set (0.005 sec)

MySQL [OCEANBASE]> show parameters like ‘%syslog_level%’;
±------±---------±--------------±---------±-------------±----------±------±-----------------------------------------------------------------------------------------------------------------------------±---------±--------±--------±------------------+
| zone | svr_type | svr_ip | svr_port | name | data_type | value | info | section | scope | source | edit_level |
±------±---------±--------------±---------±-------------±----------±------±-----------------------------------------------------------------------------------------------------------------------------±---------±--------±--------±------------------+
| zone2 | observer | 192.169.4.162 | 2882 | syslog_level | NULL | WDIAG | specifies the current level of logging. There are DEBUG, TRACE, WDIAG, EDIAG, INFO, WARN, ERROR, seven different log levels. | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone3 | observer | 192.169.4.163 | 2882 | syslog_level | NULL | WDIAG | specifies the current level of logging. There are DEBUG, TRACE, WDIAG, EDIAG, INFO, WARN, ERROR, seven different log levels. | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone1 | observer | 192.169.4.161 | 2882 | syslog_level | NULL | WDIAG | specifies the current level of logging. There are DEBUG, TRACE, WDIAG, EDIAG, INFO, WARN, ERROR, seven different log levels. | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
±------±---------±--------------±---------±-------------±----------±------±-----------------------------------------------------------------------------------------------------------------------------±---------±--------±--------±------------------+
3 rows in set (0.008 sec)

MySQL [OCEANBASE]> show parameters like ‘%syslog_io_bandwidth_limit%’;
±------±---------±--------------±---------±--------------------------±----------±------±-------------------------------------------------------------------------------------------------±---------±--------±--------±------------------+
| zone | svr_type | svr_ip | svr_port | name | data_type | value | info | section | scope | source | edit_level |
±------±---------±--------------±---------±--------------------------±----------±------±-------------------------------------------------------------------------------------------------±---------±--------±--------±------------------+
| zone1 | observer | 192.169.4.161 | 2882 | syslog_io_bandwidth_limit | NULL | 30MB | Syslog IO bandwidth limitation, exceeding syslog would be truncated. Use 0 to disable ERROR log. | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone2 | observer | 192.169.4.162 | 2882 | syslog_io_bandwidth_limit | NULL | 30MB | Syslog IO bandwidth limitation, exceeding syslog would be truncated. Use 0 to disable ERROR log. | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone3 | observer | 192.169.4.163 | 2882 | syslog_io_bandwidth_limit | NULL | 30MB | Syslog IO bandwidth limitation, exceeding syslog would be truncated. Use 0 to disable ERROR log. | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
±------±---------±--------------±---------±--------------------------±----------±------±-------------------------------------------------------------------------------------------------±---------±--------±--------±------------------+
3 rows in set (0.007 sec)

MySQL [oceanbase]> show parameters like ‘%syslog_io_bandwidth_limit%’;
±------±---------±--------------±---------±--------------------------±----------±------±-------------------------------------------------------------------------------------------------±---------±--------±--------±------------------+
| zone | svr_type | svr_ip | svr_port | name | data_type | value | info | section | scope | source | edit_level |
±------±---------±--------------±---------±--------------------------±----------±------±-------------------------------------------------------------------------------------------------±---------±--------±--------±------------------+
| zone1 | observer | 192.169.4.161 | 2882 | syslog_io_bandwidth_limit | NULL | 30MB | Syslog IO bandwidth limitation, exceeding syslog would be truncated. Use 0 to disable ERROR log. | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone3 | observer | 192.169.4.163 | 2882 | syslog_io_bandwidth_limit | NULL | 30MB | Syslog IO bandwidth limitation, exceeding syslog would be truncated. Use 0 to disable ERROR log. | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone2 | observer | 192.169.4.162 | 2882 | syslog_io_bandwidth_limit | NULL | 30MB | Syslog IO bandwidth limitation, exceeding syslog would be truncated. Use 0 to disable ERROR log. | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
±------±---------±--------------±---------±--------------------------±----------±------±-------------------------------------------------------------------------------------------------±---------±--------±--------±------------------+
3 rows in set (0.007 sec)

MySQL [oceanbase]> select count(),tenant_id,zone_list,unit_count from oceanbase.__all_resource_pool group by tenant_id,zone_list,unit_count;
±---------±----------±------------------±-----------+
| count(
) | tenant_id | zone_list | unit_count |
±---------±----------±------------------±-----------+
| 1 | 1 | zone1;zone2;zone3 | 1 |
| 1 | 1002 | zone3 | 1 |
| 1 | 1002 | zone2 | 1 |
| 1 | 1002 | zone1 | 1 |
±---------±----------±------------------±-----------+
[root@192-169-4-161 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 4.1G 0 loop /mnt/localrepo
sda 8:0 0 5.5T 0 disk
sdb 8:16 0 5.5T 0 disk
sdc 8:32 0 5.5T 0 disk
sdd 8:48 0 5.5T 0 disk
sde 8:64 0 5.5T 0 disk
sdf 8:80 0 3.5T 0 disk
鈹斺攢sdf1 8:81 0 3.5T 0 part
鈹溾攢vg_ob_data-ob_log1 252:3 0 2T 0 lvm /data/log1
鈹斺攢vg_ob_data-ob_data 252:4 0 10T 0 lvm /data/1
sdg 8:96 0 3.5T 0 disk
鈹斺攢sdg1 8:97 0 3.5T 0 part
鈹溾攢vg_ob_data-ob_log1 252:3 0 2T 0 lvm /data/log1
鈹斺攢vg_ob_data-ob_data 252:4 0 10T 0 lvm /data/1
sdh 8:112 0 3.5T 0 disk
鈹斺攢sdh1 8:113 0 3.5T 0 part
鈹溾攢vg_ob_data-ob_log1 252:3 0 2T 0 lvm /data/log1
鈹斺攢vg_ob_data-ob_data 252:4 0 10T 0 lvm /data/1
sdi 8:128 0 3.5T 0 disk
鈹斺攢sdi1 8:129 0 3.5T 0 part
鈹溾攢vg_ob_data-ob_log1 252:3 0 2T 0 lvm /data/log1
鈹斺攢vg_ob_data-ob_data 252:4 0 10T 0 lvm /data/1
sdj 8:144 0 3.5T 0 disk
鈹斺攢sdj1 8:145 0 3.5T 0 part
鈹溾攢vg_ob_data-ob_log1 252:3 0 2T 0 lvm /data/log1
鈹斺攢vg_ob_data-ob_data 252:4 0 10T 0 lvm /data/1
sdk 8:160 0 558.4G 0 disk
鈹溾攢sdk1 8:161 0 200M 0 part /boot/efi
鈹溾攢sdk2 8:162 0 1G 0 part /boot
鈹斺攢sdk3 8:163 0 557.2G 0 part
鈹溾攢klas00-root 252:0 0 503.2G 0 lvm /
鈹溾攢klas00-swap 252:1 0 4G 0 lvm
鈹斺攢klas00-backup 252:2 0 50G 0 lvm
[root@192-169-4-161 ~]# lscpu | grep Architecture
Architecture: aarch64

日志太大上传不了,压塑也大于10M

[2024-04-22 16:35:34.276206] INFO [RPC.OBMYSQL] do_accept_one (ob_sql_nio.cpp:942) [10250][sql_nio3][T0][Y0-0000000000000000-0-0] [lt=10] accept one succ(*s={this:0xfffcba4bedf0, session_id:3221558147, trace_id:Y0-0000000000000000-0-0, sql_handling_stage:-1, sql_initiative_shutdown:false, fd:663, err:0, last_decode_time:0, last_write_time:1713774934276188, pending_write_task:{buf:null, sz:0}, need_epoll_trigger_write:false, consume_size:0, pending_flag:0, may_handling_flag:true, handler_close_flag:false})
[2024-04-22 16:35:34.277017] INFO [SERVER] extract_user_tenant (obmp_connect.cpp:78) [10250][sql_nio3][T0][Y0-0000000000000000-0-0] [lt=16] username and tenantname(user_name=nessusXXXXXXXXXXXXXXXXXXXXmysql_anonymous_login_handshake_info_leakage.naslXXXXXXXXXXXXXXXXXXXX, tenant_name=)
[2024-04-22 16:35:34.277038] INFO [SERVER] dispatch_req (ob_srv_deliver.cpp:267) [10250][sql_nio3][T1][Y0-0000000000000000-0-0] [lt=9] succeed to dispatch to tenant mysql queue(tenant_id=1)
[2024-04-22 16:35:34.277122] INFO load_privilege_info (obmp_connect.cpp:536) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=18] no tenant name set, use default tenant name(tenant_name=sys)
[2024-04-22 16:35:34.277136] WDIAG [SERVER] load_privilege_info (obmp_connect.cpp:550) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=13][errcode=-5108] invalid length for db_name or user_name(db_name_=, user_name_=XXXXXXXXXXXmysql_anonymous_login_handshake_info_leakage.naslXXXXXXXXXXXXXXXXXXXX, ret=-5108)
[2024-04-22 16:35:34.277156] WDIAG [SERVER] verify_identify (obmp_connect.cpp:1931) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=19][errcode=-5108] load privilege info fail(pre_ret=-5108, ret=-5108, GCTX.status_=2)
[2024-04-22 16:35:34.277164] WDIAG [SERVER] process (obmp_connect.cpp:344) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=8][errcode=-5108] fail to verify_identify(ret=-5108)
[2024-04-22 16:35:34.277197] INFO [SERVER] send_error_packet (obmp_packet_sender.cpp:319) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=7] sending error packet(ob_error=-5108, client error=-5108, extra_err_info=NULL, lbt()=“0xfcbfa34 0x7db9ee0 0x7d6a59c 0x7d9b5dc 0x10a80bb8 0x10a7b07c 0x10a7bf68 0x10a7c4dc 0x102692ac 0x10265560 0xfffed34a88cc 0xfffed33ea1ec”)
[2024-04-22 16:35:34.277237] INFO [SERVER] free_session (obmp_base.cpp:317) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=9] free session successfully(ctx={has_inc_active_num:false, tenant_id:1, sessid:3221558147, proxy_sessid:1})
[2024-04-22 16:35:34.277248] INFO [SERVER] free_session (obmp_base.cpp:325) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=10] mark session id unused(sessid=3221558147)
[2024-04-22 16:35:34.277267] WDIAG [SERVER] disconnect (obmp_packet_sender.cpp:788) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=5][errcode=0] server close connection(sessid=3221558147, proxy_sessid=1, stack=“0xfcbfa34 0x7dbdabc 0x7d8b0b8 0x7d9b198 0x10a80bb8 0x10a7b07c 0x10a7bf68 0x10a7c4dc 0x102692ac 0x10265560 0xfffed34a88cc 0xfffed33ea1ec”)
[2024-04-22 16:35:34.277276] WDIAG [SERVER] get_session (obmp_packet_sender.cpp:545) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=8][errcode=-4018] get session fail(ret=-4018, sessid=3221558147, proxy_sessid=1)
[2024-04-22 16:35:34.277282] WDIAG [SERVER] disconnect (obmp_packet_sender.cpp:792) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=5][errcode=-4016] session is null
[2024-04-22 16:35:34.277291] INFO [SERVER] process (obmp_connect.cpp:479) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=7] MySQL LOGIN(direct_client_ip=“192.169.4.100”, client_ip=192.169.4.100, tenant_name=sys, tenant_id=1, user_name=XXXXXXXXXXXmysql_anonymous_login_handshake_info_leakage.naslXXXXXXXXXXXXXXXXXXXX, host_name=xxx.xxx.xxx.xxx, sessid=3221558147, proxy_sessid=1, sess_create_time=0, from_proxy=false, from_java_client=false, from_oci_client=false, from_jdbc_client=false, capability=128991, proxy_capability=0, use_ssl=false, c/s protocol=“OB_MYSQL_CS_TYPE”, autocommit=false, proc_ret=-5108, ret=0, conn->client_type_=3, conn->client_version_=0)
[2024-04-22 16:35:34.277310] WDIAG [SERVER] get_session (obmp_packet_sender.cpp:545) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=17][errcode=-4018] get session fail(ret=-4018, sessid=3221558147, proxy_sessid=1)
[2024-04-22 16:35:34.277315] WDIAG [SERVER] flush_buffer (obmp_packet_sender.cpp:1003) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=4][errcode=-4018] fail to get session info(ret=-4018)
[2024-04-22 16:35:34.277325] WDIAG [SERVER] response (obmp_base.cpp:89) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=7][errcode=-4018] failed to flush_buffer(ret=-4018)
[2024-04-22 16:35:34.277332] WDIAG [RPC.FRAME] run (ob_sql_processor.cpp:48) [10495][T1_MysqlQueueTh][T1][Y0-00061657FD9FF52C-0-0] [lt=7][errcode=-4018] response rpc result fail(ret=-4018)
[2024-04-22 16:35:34.278534] WDIAG pkts_sk_consume (handle_io.t.h:53) [10043][pnio1][T0][Y0-0000000000000000-0-0] [lt=16][errcode=0] PNIO do_decode fail: 61
[2024-04-22 16:35:34.278553] INFO eloop_handle_sock_event (eloop.c:95) [10043][pnio1][T0][Y0-0000000000000000-0-0] [lt=12] PNIO sock destroy: sock=0xfffe00688990, connection=fd:300:local:192.169.4.161:2882:remote:192.169.4.161:30816, err=61
[2024-04-22 16:35:34.278580] WDIAG pkts_sk_consume (handle_io.t.h:53) [10039][pnio1][T0][Y0-0000000000000000-0-0] [lt=2][errcode=0] PNIO do_decode fail: 61
[2024-04-22 16:35:34.278595] INFO eloop_handle_sock_event (eloop.c:95) [10039][pnio1][T0][Y0-0000000000000000-0-0] [lt=11] PNIO sock destroy: sock=0xfffe601a8990, connection=fd:281:local:0.0.0.0:0:remote:0.0.0.0:0, err=61
[2024-04-22 16:35:34.278602] WDIAG sock_destroy (eloop.c:66) [10039][pnio1][T0][Y0-0000000000000000-0-0] [lt=6][errcode=0] PNIO epoll_ctl delete fd faild, s=0xfffe601a8990, s->fd=281, errno=9
[2024-04-22 16:35:34.278601] INFO pkts_sk_delete (pkts_sk_factory.h:41) [10043][pnio1][T0][Y0-0000000000000000-0-0] [lt=6] PNIO sk_destroy: s=0xfffe00688990 io=0xfffe5e004410
[2024-04-22 16:35:34.278608] WDIAG sock_destroy (eloop.c:72) [10039][pnio1][T0][Y0-0000000000000000-0-0] [lt=5][errcode=0] PNIO close sock fd faild, s=0xfffe601a8990, s->fd=281, errno=9
[2024-04-22 16:35:34.278614] INFO pkts_sk_delete (pkts_sk_factory.h:41) [10039][pnio1][T0][Y0-0000000000000000-0-0] [lt=5] PNIO sk_destroy: s=0xfffe601a8990 io=0xfffe5f404410
[2024-04-22 16:35:34.278644] INFO [RPC.OBRPC] do_server_loop (ob_net_keepalive.cpp:480) [10245][KeepAliveServer][T0][Y0-0000000000000000-0-0] [lt=11] socket need_disconn(n=0, errno=11)
[2024-04-22 16:35:34.278660] INFO [RPC.OBRPC] do_server_loop (ob_net_keepalive.cpp:508) [10245][KeepAliveServer][T0][Y0-0000000000000000-0-0] [lt=12] server connection closed, fd: 287, addr: “192.169.4.161:30800”
[2024-04-22 16:35:34.278702] WDIAG pkts_sk_consume (handle_io.t.h:53) [10042][pnio1][T0][Y0-0000000000000000-0-0] [lt=13][errcode=0] PNIO do_decode fail: 61
[2024-04-22 16:35:34.278717] INFO eloop_handle_sock_event (eloop.c:95) [10042][pnio1][T0][Y0-0000000000000000-0-0] [lt=11] PNIO sock destroy: sock=0xfffe2a2d4050, connection=fd:295:local:0.0.0.0:0:remote:0.0.0.0:0, err=61
[2024-04-22 16:35:34.278724] WDIAG sock_destroy (eloop.c:66) [10042][pnio1][T0][Y0-0000000000000000-0-0] [lt=6][errcode=0] PNIO epoll_ctl delete fd faild, s=0xfffe2a2d4050, s->fd=295, errno=9
[2024-04-22 16:35:34.278731] WDIAG sock_destroy (eloop.c:72) [10042][pnio1][T0][Y0-0000000000000000-0-0] [lt=6][errcode=0] PNIO close sock fd faild, s=0xfffe2a2d4050, s->fd=295, errno=9
[2024-04-22 16:35:34.278736] INFO pkts_sk_delete (pkts_sk_factory.h:41) [10042][pnio1][T0][Y0-0000000000000000-0-0] [lt=5] PNIO sk_destroy: s=0xfffe2a2d4050 io=0xfffe5e604410
[2024-04-22 16:35:34.278762] WDIAG pkts_sk_consume (handle_io.t.h:53) [10038][pnio1][T0][Y0-0000000000000000-0-0] [lt=15][errcode=0] PNIO do_decode fail: 61
[2024-04-22 16:35:34.278777] INFO eloop_handle_sock_event (eloop.c:95) [10038][pnio1][T0][Y0-0000000000000000-0-0] [lt=11] PNIO sock destroy: sock=0xfffe2a3572d0, connection=fd:305:local:0.0.0.0:0:remote:0.0.0.0:0, err=61
[2024-04-22 16:35:34.278784] WDIAG sock_destroy (eloop.c:66) [10038][pnio1][T0][Y0-0000000000000000-0-0] [lt=6][errcode=0] PNIO epoll_ctl delete fd faild, s=0xfffe2a3572d0, s->fd=305, errno=9
[2024-04-22 16:35:34.278790] WDIAG sock_destroy (eloop.c:72) [10038][pnio1][T0][Y0-0000000000000000-0-0] [lt=5][errcode=0] PNIO close sock fd faild, s=0xfffe2a3572d0, s->fd=305, errno=9
[2024-04-22 16:35:34.278795] INFO pkts_sk_delete (pkts_sk_factory.h:41) [10038][pnio1][T0][Y0-0000000000000000-0-0] [lt=5] PNIO sk_destroy: s=0xfffe2a3572d0 io=0xfffe5f804410
[2024-04-22 16:35:34.278821] WDIAG pkts_sk_consume (handle_io.t.h:53) [10046][pnio1][T0][Y0-0000000000000000-0-0] [lt=1][errcode=0] PNIO do_decode fail: 61
[2024-04-22 16:35:34.278841] INFO eloop_handle_sock_event (eloop.c:95) [10046][pnio1][T0][Y0-0000000000000000-0-0] [lt=15] PNIO sock destroy: sock=0xfffe2a3b8990, connection=fd:309:local:0.0.0.0:0:remote:0.0.0.0:0, err=61
[2024-04-22 16:35:34.278850] WDIAG sock_destroy (eloop.c:66) [10046][pnio1][T0][Y0-0000000000000000-0-0] [lt=8][errcode=0] PNIO epoll_ctl delete fd faild, s=0xfffe2a3b8990, s->fd=309, errno=9
[2024-04-22 16:35:34.278864] WDIAG sock_destroy (eloop.c:72) [10046][pnio1][T0][Y0-0000000000000000-0-0] [lt=13][errcode=0] PNIO close sock fd faild, s=0xfffe2a3b8990, s->fd=309, errno=9
[2024-04-22 16:35:34.278870] INFO pkts_sk_delete (pkts_sk_factory.h:41) [10046][pnio1][T0][Y0-0000000000000000-0-0] [lt=5] PNIO sk_destroy: s=0xfffe2a3b8990 io=0xfffe5d204410
[2024-04-22 16:35:34.278929] WDIAG pkts_sk_consume (handle_io.t.h:53) [10045][pnio1][T0][Y0-0000000000000000-0-0] [lt=20][errcode=0] PNIO do_decode fail: 61
[2024-04-22 16:35:34.278944] INFO eloop_handle_sock_event (eloop.c:95) [10045][pnio1][T0][Y0-0000000000000000-0-0] [lt=11] PNIO sock destroy: sock=0xfffe60086050, connection=fd:319:local:0.0.0.0:0:remote:0.0.0.0:0, err=61
[2024-04-22 16:35:34.278952] WDIAG sock_destroy (eloop.c:66) [10045][pnio1][T0][Y0-0000000000000000-0-0] [lt=7][errcode=0] PNIO epoll_ctl delete fd faild, s=0xfffe60086050, s->fd=319, errno=9
[2024-04-22 16:35:34.278962] WDIAG sock_destroy (eloop.c:72) [10045][pnio1][T0][Y0-0000000000000000-0-0] [lt=6][errcode=0] PNIO close sock fd faild, s=0xfffe60086050, s->fd=319, errno=9
[2024-04-22 16:35:34.278968] INFO pkts_sk_delete (pkts_sk_factory.h:41) [10045][pnio1][T0][Y0-0000000000000000-0-0] [lt=5] PNIO sk_destroy: s=0xfffe60086050 io=0xfffe5d804410
[2024-04-22 16:35:34.278998] WDIAG pktc_sk_consume (handle_io.t.h:53) [10037][pnio1][T0][Y0-0000000000000000-0-0] [lt=19][errcode=0] PNIO do_decode fail: 61
[2024-04-22 16:35:34.279012] INFO eloop_handle_sock_event (eloop.c:95) [10037][pnio1][T0][Y0-0000000000000000-0-0] [lt=9] PNIO sock destroy: sock=0xfffe60086960, connection=fd:323:local:0.0.0.0:0:remote:0.0.0.0:0, err=61
[2024-04-22 16:35:34.279020] WDIAG sock_destroy (eloop.c:66) [10037][pnio1][T0][Y0-0000000000000000-0-0] [lt=7][errcode=0] PNIO epoll_ctl delete fd faild, s=0xfffe60086960, s->fd=323, errno=9
[2024-04-22 16:35:34.279026] WDIAG sock_destroy (eloop.c:72) [10037][pnio1][T0][Y0-0000000000000000-0-0] [lt=5][errcode=0] PNIO close sock fd faild, s=0xfffe60086960, s->fd=323, errno=9
[2024-04-22 16:35:34.279033] INFO pktc_sk_delete (pktc_sk_factory.h:69) [10037][pnio1][T0][Y0-0000000000000000-0-0] [lt=6] PNIO sk_destroy: s=0xfffe60086960 io=0xfffe5fd84610
[2024-04-22 16:35:34.279065] WDIAG pkts_sk_consume (handle_io.t.h:53) [10040][pnio1][T0][Y0-0000000000000000-0-0] [lt=1][errcode=0] PNIO do_decode fail: 61
[2024-04-22 16:35:34.279082] INFO eloop_handle_sock_event (eloop.c:95) [10040][pnio1][T0][Y0-0000000000000000-0-0] [lt=11] PNIO sock destroy: sock=0xfffe2a296990, connection=fd:328:local:192.169.4.161:2882:remote:192.169.4.161:30898, err=61
[2024-04-22 16:35:34.279089] WDIAG sock_destroy (eloop.c:66) [10040][pnio1][T0][Y0-0000000000000000-0-0] [lt=5][errcode=0] PNIO epoll_ctl delete fd faild, s=0xfffe2a296990, s->fd=328, errno=9
[2024-04-22 16:35:34.279095] WDIAG sock_destroy (eloop.c:72) [10040][pnio1][T0][Y0-0000000000000000-0-0] [lt=5][errcode=0] PNIO close sock fd faild, s=0xfffe2a296990, s->fd=328, errno=9
[2024-04-22 16:35:34.279101] INFO pkts_sk_delete (pkts_sk_factory.h:41) [10040][pnio1][T0][Y0-0000000000000000-0-0] [lt=5] PNIO sk_destroy: s=0xfffe2a296990 io=0xfffe5ee04410
[2024-04-22 16:35:34.279119] WDIAG pkts_sk_consume (handle_io.t.h:53) [10041][pnio1][T0][Y0-0000000000000000-0-0] [lt=13][errcode=0] PNIO do_decode fail: 61
[2024-04-22 16:35:34.279135] INFO eloop_handle_sock_event (eloop.c:95) [10041][pnio1][T0][Y0-0000000000000000-0-0] [lt=12] PNIO sock destroy: sock=0xfffe007ed2d0, connection=fd:330:local:0.0.0.0:0:remote:0.0.0.0:0, err=61
[2024-04-22 16:35:34.279141] WDIAG sock_destroy (eloop.c:66) [10041][pnio1][T0][Y0-0000000000000000-0-0] [lt=5][errcode=0] PNIO epoll_ctl delete fd faild, s=0xfffe007ed2d0, s->fd=330, errno=9
[2024-04-22 16:35:34.279148] WDIAG sock_destroy (eloop.c:72) [10041][pnio1][T0][Y0-0000000000000000-0-0] [lt=6][errcode=0] PNIO close sock fd faild, s=0xfffe007ed2d0, s->fd=330, errno=9
[2024-04-22 16:35:34.279156] INFO pkts_sk_delete (pkts_sk_factory.h:41) [10041][pnio1][T0][Y0-0000000000000000-0-0] [lt=7] PNIO sk_destroy: s=0xfffe007ed2d0 io=0xfffe5ea04410
[2024-04-22 16:35:34.279177] WDIAG pkts_sk_consume (handle_io.t.h:53) [10044][pnio1][T0][Y0-0000000000000000-0-0] [lt=2][errcode=0] PNIO do_decode fail: 61
[2024-04-22 16:35:34.279190] INFO eloop_handle_sock_event (eloop.c:95) [10044][pnio1][T0][Y0-0000000000000000-0-0] [lt=10] PNIO sock destroy: sock=0xfffe2a2ac990, connection=fd:335:local:0.0.0.0:0:remote:0.0.0.0:0, err=61
[2024-04-22 16:35:34.279197] WDIAG sock_destroy (eloop.c:66) [10044][pnio1][T0][Y0-0000000000000000-0-0] [lt=6][errcode=0] PNIO epoll_ctl delete fd faild, s=0xfffe2a2ac990, s->fd=335, errno=9
[2024-04-22 16:35:34.279203] WDIAG sock_destroy (eloop.c:72) [10044][pnio1][T0][Y0-0000000000000000-0-0] [lt=6][errcode=0] PNIO close sock fd faild, s=0xfffe2a2ac990, s->fd=335, errno=9
[2024-04-22 16:35:34.279207] INFO pkts_sk_delete (pkts_sk_factory.h:41) [10044][pnio1][T0][Y0-0000000000000000-0-0] [lt=4] PNIO sk_destroy: s=0xfffe2a2ac990 io=0xfffe5dc04410
CRASH ERROR!!! IP=ffffffffffffffff, RBP=ffffffffffffffff, sig=11, sig_code=1, sig_addr=58585858598930, RLIMIT_CORE=unlimited, timestamp=1713774934282065, tid=10250, tname=sql_nio3, trace_id=0-0-0-0, extra_info=((null)), lbt=, SQL=
[2024-04-23 11:14:51.793123] INFO [SERVER] main (main.cpp:551) [2397370][observer][T0][Y0-0000000000000000-0-0] [lt=0] succ to init logger(default file=“log/observer.log”, rs file=“log/rootservice.log”, election file=“log/election.log”, trace file=“log/trace.log”, audit_file=“audit/observer_2397369_202404231114514096.aud”, max_log_file_size=268435456, enable_async_log=true)
[2024-04-23 11:14:51.793175] INFO [SERVER] main (main.cpp:555) [2397370][observer][T0][Y0-0000000000000000-0-0] [lt=51] Virtual memory : 1,004,994,560 byte
[2024-04-23 11:14:51.793184] INFO [SERVER] main (main.cpp:558) [2397370][observer][T0][Y0-0000000000000000-0-0] [lt=6] Build basic information for each syslog file(info=“address: , observer version: OceanBase_CE 4.2.1.3, revision: 103000032023122818-8fe69c2056b07154bbd1ebd2c26e818ee0d5c56f, sysname: Linux, os release: 4.19.90-23.8.v2101.ky10.aarch64, machine: aarch64, tz GMT offset: 08:00”)
./bin/observer
observer (OceanBase_CE 4.2.1.3)
REVISION: 103000032023122818-8fe69c2056b07154bbd1ebd2c26e818ee0d5c56f
BUILD_BRANCH: HEAD
BUILD_TIME: Jan 15 2024 07:17:12
BUILD_FLAGS: RelWithDebInfo
BUILD_INFO:

Copyright (c) 2011-present OceanBase Inc.

链接: 百度网盘 请输入提取码 提取码: v9y3 复制这段内容后打开百度网盘手机App,操作更方便哦

麻烦您提供一下以下信息:
1,root@sys租户登录集群,执行SQL提供下结果:
1),select svr_ip,zone,with_rootserver,status,block_migrate_in_time,start_service_time,stop_time,build_version from oceanbase.__all_server order by zone;
2),select tenant_id,tenant_name,primary_zone,compatibility_mode from oceanbase.__all_tenant;
3),show parameters like ‘%syslog_level%’;
4),show parameters like ‘%syslog_io_bandwidth_limit%’;
5),select count(*),tenant_id,zone_list,unit_count from oceanbase.__all_resource_pool group by tenant_id,zone_list,unit_count;
2,登录OB任意一台主机,执行lsblk提供下结果
3,登录OB任意一台主机,执行lscpu | grep Architecture 提供下结果
4.提供CORE的节点,日志包含"CRASH ERROR"关键字的observer.log日志(请提供原始文件)

这种模版化的信息收集,推荐直接用obdiag

比如:obdiag gather scene run --scene=observer.unknown

说明:obdiag gather scene list 可以看到收集的各种场景的说明

1 个赞

我看这个问题有几天了,可以参考下自助分析的三步,减少些沟通交流的时间,尽快定位问题:【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

集群已恢复了,现在安装这个能收集到相关信息吗?日志我是有备份出来。或者在我自己的测试环境安装这个,来解析已备份的日志这样行吗?

可以的,obdiag 日志分析工具支持离线分析日志文件,你可以把你备份的日志拿出来,然后执行obdiag analyze log --files <文件夹或者文件>