OB 信息
-
- 生产环境,版本:4.2.2.1 ,软件包:
oceanbase-ce-4.2.2.1-101000012024030709.el7.x86_64.rpm
。
- 生产环境,版本:4.2.2.1 ,软件包:
-
- 主集群拓扑
1-1-1
,备集群是单节点集群(严格来说 4.2 已经没有备集群概念,这里方便指代)。主集群服务器 CPU、内存和 NVMe SSD 符合 OB 生产要求,备集群 CPU 和 内存一样,磁盘是 SATA SSD, IO 性能相比主集群差一些,但比 SAS 盘还是要好很多。
- 主集群拓扑
-
- 主集群上部分租户在备集群上了做了备租户。除了一个备租户时延变大,其他都正常。
-
- 问题租户的数据容量 100G 不到。主租户只有写,但是写入量非常少。
问题现象
-
告警现象
备租户同步延时告警。
-
主租户性能信息。
主租户资源规格信息
+-----------------------+-------------------------------+---------+---------+-------------+------------------+---------------------+---------------------+---------+-------+------------------+-----------+-------------+
| resource_pool_name | unit_config_name | max_cpu | min_cpu | mem_size_gb | log_disk_size_gb | max_iops | min_iops | unit_id | zone | observer | tenant_id | tenant_name |
+-----------------------+-------------------------------+---------+---------+-------------+------------------+---------------------+---------------------+---------+-------+------------------+-----------+-------------+
| pool_ten005_zone1_gtu | config_ten005_zone1_U4C4G_gtu | 4 | 4 | 4.00 | 12.00 | 9223372036854775807 | 9223372036854775807 | 1022 | zone1 | 10.0.0.36:2882 | 1016 | ten005 |
| pool_ten005_zone3_xld | config_ten005_zone3_U4C4G_xld | 4 | 4 | 4.00 | 12.00 | 9223372036854775807 | 9223372036854775807 | 1023 | zone3 | 10.0.0.38:2882 | 1016 | ten005 |
| pool_ten005_zone2_tqe | config_ten005_zone2_U4C4G_tqe | 4 | 4 | 4.00 | 12.00 | 9223372036854775807 | 9223372036854775807 | 1024 | zone2 | 10.0.0.37:2882 | 1016 | ten005 |
+-----------------------+-------------------------------+---------+---------+-------------+------------------+---------------------+---------------------+---------+-------+------------------+-----------+-------------+
3 rows in set (0.00 sec)
- 备租户性能信息。
备租户在下午调整过 租户资源规格,内存从 8G 降到 2G ,后又提升到 4G 。
备租户资源规格信息
+-----------------------+-------------------------------+---------+---------+-------------+------------------+---------------------+---------------------+---------+-------+------------------+-----------+-------------+
| resource_pool_name | unit_config_name | max_cpu | min_cpu | mem_size_gb | log_disk_size_gb | max_iops | min_iops | unit_id | zone | observer | tenant_id | tenant_name |
+-----------------------+-------------------------------+---------+---------+-------------+------------------+---------------------+---------------------+---------+-------+------------------+-----------+-------------+
| pool_ten005_zone1_gfj | config_ten005_zone1_U4C4G_hkk | 4 | 4 | 4.00 | 12.00 | 9223372036854775807 | 9223372036854775807 | 1003 | zone1 | 10.0.0.41:2882 | 1006 | ten005 |
+-----------------------+-------------------------------+---------+---------+-------------+------------------+---------------------+---------------------+---------+-------+------------------+-----------+-------------+1 row in set (0.02 sec)
- 主租户的信息。
MySQL [oceanbase]> select tenant_id, tenant_name, tenant_type, primary_zone,tenant_role, scn_to_timestamp(sync_scn) sync_ts, scn_to_timestamp(replayable_scn) replayable_ts, scn_to_timestamp(readable_scn) readable_ts, scn_to_timestamp(recovery_until_scn) recovery_until_ts, log_mode,max_ls_id from oceanbase.dba_ob_tenants where tenant_type='USER' and tenant_id in (1016);
+-----------+-------------+-------------+-------------------+-------------+----------------------------+----------------------------+----------------------------+----------------------------+------------+-----------+
| tenant_id | tenant_name | tenant_type | primary_zone | tenant_role | sync_ts | replayable_ts | readable_ts | recovery_until_ts | log_mode | max_ls_id |
+-----------+-------------+-------------+-------------------+-------------+----------------------------+----------------------------+----------------------------+----------------------------+------------+-----------+
| 1016 | ten005 | USER | zone2;zone1;zone3 | PRIMARY | 2024-05-22 18:31:58.495893 | 2024-05-22 18:31:58.495893 | 2024-05-22 18:31:58.495893 | 2116-02-21 07:53:38.427387 | ARCHIVELOG | 1003 |
+-----------+-------------+-------------+-------------------+-------------+----------------------------+----------------------------+----------------------------+----------------------------+------------+-----------+
1 row in set (0.01 sec)
MySQL [oceanbase]> select tenant_id,ls_id,svr_ip,role,access_mode,in_sync, scn_to_timestamp(begin_scn) begin_timestamp,scn_to_timestamp(end_scn) end_timestamp,scn_to_timestamp(max_scn) max_timestamp from oceanbase.gv$ob_log_stat where role='LEADER' and tenant_id in (1016);
+-----------+-------+-------------+--------+-------------+---------+----------------------------+----------------------------+----------------------------+
| tenant_id | ls_id | svr_ip | role | access_mode | in_sync | begin_timestamp | end_timestamp | max_timestamp |
+-----------+-------+-------------+--------+-------------+---------+----------------------------+----------------------------+----------------------------+
| 1016 | 1 | 10.0.0.37 | LEADER | APPEND | YES | 2024-05-15 19:05:47.830719 | 2024-05-22 18:27:58.041650 | 2024-05-22 18:27:58.041650 |
| 1016 | 1001 | 10.0.0.37 | LEADER | APPEND | YES | 2024-05-15 18:05:00.916525 | 2024-05-22 18:27:58.041650 | 2024-05-22 18:27:58.041650 |
+-----------+-------+-------------+--------+-------------+---------+----------------------------+----------------------------+----------------------------+
2 rows in set (0.00 sec)
- 备租户信息。
MySQL [oceanbase]> select tenant_id, tenant_name, tenant_type, primary_zone,tenant_role, scn_to_timestamp(sync_scn) sync_ts, scn_to_timestamp(replayable_scn) replayable_ts, scn_to_timestamp(readable_scn) readable_ts, scn_to_timestamp(recovery_until_scn) recovery_until_ts, log_mode,max_ls_id from oceanbase.dba_ob_tenants where tenant_type='USER' and tenant_id in (1006);
+-----------+-------------+-------------+--------------+-------------+----------------------------+----------------------------+----------------------------+----------------------------+--------------+-----------+
| tenant_id | tenant_name | tenant_type | primary_zone | tenant_role | sync_ts | replayable_ts | readable_ts | recovery_until_ts | log_mode | max_ls_id |
+-----------+-------------+-------------+--------------+-------------+----------------------------+----------------------------+----------------------------+----------------------------+--------------+-----------+
| 1006 | ten005 | USER | RANDOM | STANDBY | 2024-05-22 12:48:11.425191 | 2024-05-22 12:48:11.425191 | 2024-05-22 12:48:11.425191 | 2116-02-21 07:53:38.427387 | NOARCHIVELOG | 1001 |
+-----------+-------------+-------------+--------------+-------------+----------------------------+----------------------------+----------------------------+----------------------------+--------------+-----------+
1 row in set (0.02 sec)
MySQL [oceanbase]> select tenant_id,ls_id,svr_ip,role,access_mode,in_sync, scn_to_timestamp(begin_scn) begin_timestamp,scn_to_timestamp(end_scn) end_timestamp,scn_to_timestamp(max_scn) max_timestamp from oceanbase.gv$ob_log_stat where role='LEADER' and tenant_id in (1006);
+-----------+-------+-------------+--------+-------------+---------+----------------------------+----------------------------+----------------------------+
| tenant_id | ls_id | svr_ip | role | access_mode | in_sync | begin_timestamp | end_timestamp | max_timestamp |
+-----------+-------+-------------+--------+-------------+---------+----------------------------+----------------------------+----------------------------+
| 1006 | 1 | 10.0.0.41 | LEADER | RAW_WRITE | YES | 2024-05-21 20:45:47.970381 | 2024-05-22 18:33:34.639532 | 2024-05-22 18:33:34.639532 |
| 1006 | 1001 | 10.0.0.41 | LEADER | RAW_WRITE | YES | 2024-05-22 03:33:03.317788 | 2024-05-22 12:48:11.425191 | 2024-05-22 12:48:11.425191 |
+-----------+-------+-------------+--------+-------------+---------+----------------------------+----------------------------+----------------------------+
2 rows in set (0.01 sec)
看起来租户 的业务数据同步时间停留在 2024-05-22 12:48:11.425191
。原因不明。
查看备租户的日志复制源信息。
MySQL [oceanbase]> select tenant_id, id, type, substr(value,1,30) value_, scn_to_timestamp(recovery_until_scn) from oceanbase.cdb_ob_log_restore_source where tenant_id=1006;
+-----------+----+---------+--------------------------------+--------------------------------------+
| tenant_id | id | type | value_ | scn_to_timestamp(recovery_until_scn) |
+-----------+----+---------+--------------------------------+--------------------------------------+
| 1006 | 1 | SERVICE | IP_LIST=10.0.0.36:2881;10.10 | 2116-02-21 07:53:38.427387 |
+-----------+----+---------+--------------------------------+--------------------------------------+
1 row in set (0.01 sec)
是选择的基于网络的日志复制。 上面截断了部分 IP 字符串信息。
尝试解决思路
在备集群对备租户暂停和开启日志同步。
MySQL [oceanbase]> SELECT TENANT_NAME, TENANT_ID, TENANT_ROLE, SCN_TO_TIMESTAMP(SYNC_SCN)
-> FROM oceanbase.DBA_OB_TENANTS WHERE TENANT_NAME ='ten005';
+-------------+-----------+-------------+----------------------------+
| TENANT_NAME | TENANT_ID | TENANT_ROLE | SCN_TO_TIMESTAMP(SYNC_SCN) |
+-------------+-----------+-------------+----------------------------+
| ten005 | 1006 | STANDBY | 2024-05-22 12:48:11.425191 |
+-------------+-----------+-------------+----------------------------+
1 row in set (0.04 sec)
MySQL [oceanbase]> SELECT LS_ID, SCN_TO_TIMESTAMP(END_SCN) FROM oceanbase.GV$OB_LOG_STAT WHERE TENANT_ID =1006 and role='LEADER';
+-------+----------------------------+
| LS_ID | SCN_TO_TIMESTAMP(END_SCN) |
+-------+----------------------------+
| 1 | 2024-05-22 17:35:32.897059 |
| 1001 | 2024-05-22 12:48:11.425191 |
+-------+----------------------------+
2 rows in set (0.01 sec)
MySQL [oceanbase]> alter system recover standby tenant=ten005 cancel;
Query OK, 0 rows affected (0.16 sec)
MySQL [oceanbase]> SELECT TENANT_NAME, TENANT_ID, TENANT_ROLE, SCN_TO_TIMESTAMP(SYNC_SCN) FROM oceanbase.DBA_OB_TENANTS WHERE TENANT_NAME ='ten005';
+-------------+-----------+-------------+----------------------------+
| TENANT_NAME | TENANT_ID | TENANT_ROLE | SCN_TO_TIMESTAMP(SYNC_SCN) |
+-------------+-----------+-------------+----------------------------+
| ten005 | 1006 | STANDBY | 2024-05-22 12:48:11.425191 |
+-------------+-----------+-------------+----------------------------+
1 row in set (0.04 sec)
MySQL [oceanbase]> SELECT LS_ID, SCN_TO_TIMESTAMP(END_SCN) FROM oceanbase.GV$OB_LOG_STAT WHERE TENANT_ID =1006 and role='LEADER';
+-------+----------------------------+
| LS_ID | SCN_TO_TIMESTAMP(END_SCN) |
+-------+----------------------------+
| 1 | 2024-05-22 17:39:33.258304 |
| 1001 | 2024-05-22 12:48:11.425191 |
+-------+----------------------------+
2 rows in set (0.00 sec)
MySQL [oceanbase]> alter system recover standby tenant=ten005 until unlimited;
Query OK, 0 rows affected (0.05 sec)
MySQL [oceanbase]> SELECT LS_ID, SCN_TO_TIMESTAMP(END_SCN) FROM oceanbase.GV$OB_LOG_STAT WHERE TENANT_ID =1006 and role='LEADER';
+-------+----------------------------+
| LS_ID | SCN_TO_TIMESTAMP(END_SCN) |
+-------+----------------------------+
| 1 | 2024-05-22 17:39:33.258304 |
| 1001 | 2024-05-22 12:48:11.425191 |
+-------+----------------------------+
2 rows in set (0.00 sec)
end_scn
列对应的时间还是停留不变。
为了避免是源端 业务数据没有变化导致,到主租户的 test 数据库下新建了一个表和写入一笔数据。再次查看备租户的复制时间,还是不变。