ob-ce-4.0.0稳定性堪忧,新安装的集群莫名其妙就异常了,日志中也看不出来关键要点

【 使用环境 】测试环境
【 OB or 其他组件 】
【 使用版本 】ob-ce-4.0.0
【问题描述】清晰明确描述问题
【复现路径】问题出现前后相关操作
【问题现象及影响】

【附件】

新部署的环境,3节点,之前已经异常了。
执行obd cluster restart obtest后,再次obclient登入后是这样的:

$ obclient -hmgr4 -P3306 -uroot@sys -A
\s
--------------
obclient  Ver  Distrib 10.4.18-MariaDB, for Linux (x86_64) using readline 5.1

Connection id:          3222011915
Current database:       test
Current user:           root@172.16.16.7
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         OceanBase_CE 4.0.0.0 (r100000272022110114-6af7f9ae79cd0ecbafd4b1b88e2886ccdba0c3be) (Built Nov  1 2022 14:57:18)
Protocol version:       10
Connection:             mgr4 via TCP/IP
Server characterset:    utf8mb4
Db     characterset:    utf8mb4
Client characterset:    utf8mb4
Conn.  characterset:    utf8mb4
TCP port:               3306
Protocol:               Compressed
Active                  --------------

# 刚连入就异常,无法显示数据库列表
obclient [test]> show databases;
Query OK, 0 rows affected (0.000 sec)

obclient [test]> \s
ERROR 2013 (HY000): Lost connection to MySQL server during query
obclient [test]> \r
Connection id:    3222011925
Current database: test

# 重连后正常了
obclient [test]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| LBACSYS            |
| mysql              |
| oceanbase          |
| ORAAUDITOR         |
| SYS                |
| test               |
+--------------------+
7 rows in set (0.018 sec)

obclient [test]> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
|    10000 |
+----------+
1 row in set (0.025 sec)

obclient [test]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| LBACSYS            |
| mysql              |
| oceanbase          |
| ORAAUDITOR         |
| SYS                |
| test               |
+--------------------+
7 rows in set (0.018 sec)

# 执行完\s后,又异常了
obclient [test]> \s
--------------
obclient  Ver  Distrib 10.4.18-MariaDB, for Linux (x86_64) using readline 5.1

Connection id:          3222011925
Current database:       test
Current user:           root@172.16.16.7
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         OceanBase_CE 4.0.0.0 (r100000272022110114-6af7f9ae79cd0ecbafd4b1b88e2886ccdba0c3be) (Built Nov  1 2022 14:57:18)
Protocol version:       10
Connection:             mgr4 via TCP/IP
Server characterset:    utf8mb4
Db     characterset:    utf8mb4
Client characterset:    utf8mb4
Conn.  characterset:    utf8mb4
TCP port:               3306
Protocol:               Compressed
Active                  --------------

obclient [test]> show databases;
Query OK, 0 rows affected (0.000 sec)

obclient [test]>

查看某个节点的observer.log,根据fail关键字,找到下面这些,但get不到重点,不知道该怎么继续排查:

[2022-11-05 19:54:09.187880] WARN  [SERVER] runTimerTask (ob_server.cpp:2628) [15019][ServerGTimer][T0][Y0-0000000000000000-0-0] [lt=6] ObRefreshNetworkSpeedTask reload bandwidth throttle limit failed(ret=-4000, ret="OB_ERROR")
[2022-11-05 19:54:09.940307] INFO  [STORAGE.TRANS] rollback (ob_memtable_context.cpp:918) [15117][DiskUseReport][T1][YCEBAC10100A-0005ECB7C94FC2D2-0-0] [lt=9] memtable handle rollback to successfuly(from_seq_no=1667649249935576, to_seq_no=1667649249935574, *this={ObIMvccCtx={alloc_type=0 ctx_descriptor=0 min_table_version=9223372036854775807 max_table_version=0 trans_version=9223372036854775807 commit_version=0 row_purge_version=0 lock_wait_start_ts=0 replay_compact_version=0} end_code=0 is_readonly=false ref=0 trans_id={txid:2001865} ls_id=1 callback_alloc_count=0 callback_free_count=0 checksum=0 tmp_checksum=0 checksum_log_ts=0 redo_filled_count=0 redo_sync_succ_count=0 redo_sync_fail_count=0 main_list_length=0 unsynced_cnt=0 unsubmitted_cnt_=0 cb_statistics:[main=1, slave=0, merge=0, tx_end=0, rollback_to=1, fast_commit=0, remove_memtable=0]})
[2022-11-05 19:54:09.943062] INFO  [STORAGE.TRANS] rollback (ob_memtable_context.cpp:918) [15117][DiskUseReport][T1][YCEBAC10100A-0005ECB7C94FC2D3-0-0] [lt=12] memtable handle rollback to successfuly(from_seq_no=1667649249935581, to_seq_no=1667649249935579, *this={ObIMvccCtx={alloc_type=0 ctx_descriptor=0 min_table_version=9223372036854775807 max_table_version=0 trans_version=9223372036854775807 commit_version=0 row_purge_version=0 lock_wait_start_ts=0 replay_compact_version=0} end_code=0 is_readonly=false ref=0 trans_id={txid:2001866} ls_id=1 callback_alloc_count=0 callback_free_count=0 checksum=0 tmp_checksum=0 checksum_log_ts=0 redo_filled_count=0 redo_sync_succ_count=0 redo_sync_fail_count=0 main_list_length=0 unsynced_cnt=0 unsubmitted_cnt_=0 cb_statistics:[main=1, slave=0, merge=0, tx_end=0, rollback_to=1, fast_commit=0, remove_memtable=0]})
[2022-11-05 19:54:09.950303] INFO  [STORAGE.TRANS] rollback (ob_memtable_context.cpp:918) [15117][DiskUseReport][T1][YCEBAC10100A-0005ECB7C94FC2D5-0-0] [lt=11] memtable handle rollback to successfuly(from_seq_no=1667649249935590, to_seq_no=1667649249935588, *this={ObIMvccCtx={alloc_type=0 ctx_descriptor=0 min_table_version=9223372036854775807 max_table_version=0 trans_version=9223372036854775807 commit_version=0 row_purge_version=0 lock_wait_start_ts=0 replay_compact_version=0} end_code=0 is_readonly=false ref=0 trans_id={txid:2001868} ls_id=1 callback_alloc_count=0 callback_free_count=0 checksum=0 tmp_checksum=0 checksum_log_ts=0 redo_filled_count=0 redo_sync_succ_count=0 redo_sync_fail_count=0 main_list_length=0 unsynced_cnt=0 unsubmitted_cnt_=0 cb_statistics:[main=1, slave=0, merge=0, tx_end=0, rollback_to=1, fast_commit=0, remove_memtable=0]})
[2022-11-05 19:54:09.955563] INFO  [STORAGE.TRANS] rollback (ob_memtable_context.cpp:918) [15117][DiskUseReport][T1][YCEBAC10100A-0005ECB7C94FC2D7-0-0] [lt=11] memtable handle rollback to successfuly(from_seq_no=1667649249935598, to_seq_no=1667649249935596, *this={ObIMvccCtx={alloc_type=0 ctx_descriptor=0 min_table_version=9223372036854775807 max_table_version=0 trans_version=9223372036854775807 commit_version=0 row_purge_version=0 lock_wait_start_ts=0 replay_compact_version=0} end_code=0 is_readonly=false ref=0 trans_id={txid:2001870} ls_id=1 callback_alloc_count=0 callback_free_count=0 checksum=0 tmp_checksum=0 checksum_log_ts=0 redo_filled_count=0 redo_sync_succ_count=0 redo_sync_fail_count=0 main_list_length=0 unsynced_cnt=0 unsubmitted_cnt_=0 cb_statistics:[main=1, slave=0, merge=0, tx_end=0, rollback_to=1, fast_commit=0, remove_memtable=0]})
[2022-11-05 19:54:10.012669] INFO  [SHARE.LOCATION] renew_vtable_location_ (ob_vtable_location_service.cpp:207) [15114][][T0][YCEBAC10100A-0005ECB7C9CFC1CF-0-0] [lt=18] renew vtable location success(ret=0, ret="OB_SUCCESS", table_id=12208, locations=[{table_id:12208, partition_id:12208, partition_cnt:0, replica_locations:[{server:"172.16.16.10:3307", role:1, sql_port:3306, replica_type:0, reserved:0, property:{memstore_percent_:100}}], renew_time:1667649250012655, sql_renew_time:1667649250012655, is_mark_fail:false}])
[2022-11-05 19:54:10.012695] INFO  [SHARE.LOCATION] batch_process_tasks (ob_vtable_location_service.cpp:414) [15114][][T0][YCEBAC10100A-0005ECB7C9CFC1CF-0-0] [lt=26] success to process renew task(task={tenant_id:1, table_id:12208, add_timestamp:1667649250012631}, locations=[{table_id:12208, partition_id:12208, partition_cnt:0, replica_locations:[{server:"172.16.16.10:3307", role:1, sql_port:3306, replica_type:0, reserved:0, property:{memstore_percent_:100}}], renew_time:1667649250012655, sql_renew_time:1667649250012655, is_mark_fail:false}])
[2022-11-05 19:54:10.188006] WARN  load_file_to_string (utility.h:638) [15019][ServerGTimer][T0][Y0-0000000000000000-0-0] [lt=4] read /sys/class/net/eth0/speed failed, errno 22
[2022-11-05 19:54:10.188021] WARN  get_ethernet_speed (utility.cpp:624) [15019][ServerGTimer][T0][Y0-0000000000000000-0-0] [lt=12] load file /sys/class/net/eth0/speed failed, rc -4000
[2022-11-05 19:54:10.188045] WARN  [SERVER] runTimerTask (ob_server.cpp:2628) [15019][ServerGTimer][T0][Y0-0000000000000000-0-0] [lt=17] ObRefreshNetworkSpeedTask reload bandwidth throttle limit failed(ret=-4000, ret="OB_ERROR")

应该和\s相关,工作日我们会联系相关同学继续排查