使用场景说明:
1、使用obd部署的ob集群且没有接管到ocp。
2、无空闲的机器可以用来替换故障节点,希望通过下线故障节点处理后重新加入到ob集群。
故障前节点分布
MySQL [oceanbase]> select * from dba_ob_servers;
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+--------------------------------------------------------------------------+-------------------+
| SVR_IP | SVR_PORT | ID | ZONE | SQL_PORT | WITH_ROOTSERVER | STATUS | START_SERVICE_TIME | STOP_TIME | BLOCK_MIGRATE_IN_TIME | CREATE_TIME | MODIFY_TIME | BUILD_VERSION | LAST_OFFLINE_TIME |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+--------------------------------------------------------------------------+-------------------+
| 11.xxx.xxx.191 | 12882 | 1 | zone1 | 12881 | YES | ACTIVE | 2024-07-31 08:14:53.119695 | NULL | NULL | 2024-07-24 10:50:42.478616 | 2024-07-31 08:14:55.115175 | 4.3.1.0_1-bad90e897a7f0f56b8ce5c43e186aa8f4bab03c4(Jun 28 2024 23:14:03) | NULL |
| 11.xxx.xxx.191 | 22882 | 4 | zone1 | 22881 | NO | ACTIVE | 2024-07-31 08:14:07.606076 | NULL | NULL | 2024-07-24 10:51:01.744417 | 2024-07-31 08:14:10.278981 | 4.3.1.0_1-bad90e897a7f0f56b8ce5c43e186aa8f4bab03c4(Jun 28 2024 23:14:03) | NULL |
| 11.xxx.xxx.191 | 32882 | 2 | zone2 | 32881 | NO | ACTIVE | 2024-08-05 10:06:37.915423 | NULL | NULL | 2024-07-24 10:50:42.523764 | 2024-08-05 10:06:39.509455 | 4.3.1.0_1-bad90e897a7f0f56b8ce5c43e186aa8f4bab03c4(Jun 28 2024 23:14:03) | NULL |
| 11.xxx.xxx.192 | 33332 | 7 | zone3 | 33331 | NO | ACTIVE | 2024-08-05 17:23:09.258153 | NULL | NULL | 2024-08-05 17:22:59.938657 | 2024-08-05 17:23:11.080485 | 4.3.1.0_1-bad90e897a7f0f56b8ce5c43e186aa8f4bab03c4(Jun 28 2024 23:14:03) | NULL |
| 11.xxx.xxx.192 | 42882 | 5 | zone2 | 42881 | NO | ACTIVE | 2024-07-31 08:14:03.193183 | NULL | NULL | 2024-07-24 10:51:01.792832 | 2024-07-31 08:14:06.075483 | 4.3.1.0_1-bad90e897a7f0f56b8ce5c43e186aa8f4bab03c4(Jun 28 2024 23:14:03) | NULL |
| 11.xxx.xxx.192 | 52882 | 3 | zone3 | 52881 | NO | ACTIVE | 2024-07-31 08:14:57.594518 | NULL | NULL | 2024-07-24 10:50:42.562099 | 2024-07-31 08:14:59.182698 | 4.3.1.0_1-bad90e897a7f0f56b8ce5c43e186aa8f4bab03c4(Jun 28 2024 23:14:03) | NULL |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+--------------------------------------------------------------------------+-------------------+
6 rows in set (0.00 sec)
模拟故障
本次模拟的故障节点 11.xxx.xxx.192:32882
[heshun.lxd@sqaobnoxdn011162217192.sa128 /home/heshun.lxd]
$ps -ef | grep observer7
heshun.+ 86129 1 80 Aug05 ? 1-09:22:50 /home/heshun.lxd/observer7/bin/observer -r 11.162.217.191:12882:12881;11.162.217.191:32882:32881;11.162.217.192:52882:52881 -p 33331 -P 33332 -z zone3 -n obcluster -c 1 -d /obdata/data/data7 -I 11.162.217.192 -o __min_full_resource_pool_memory=2147483648,memory_limit=12G,system_memory=2G,datafile_size=10G,log_disk_size=36G,cpu_count=8,enable_syslog_wf=False,enable_syslog_recycle=True,max_syslog_file_count=30
heshun.+ 87058 1 0 Aug05 ? 00:00:16 /home/heshun.lxd/observer7/bin/obshell daemon --ip 11.162.217.192 --port 33333
heshun.+ 87097 87058 0 Aug05 ? 00:01:34 /home/heshun.lxd/observer7/bin/obshell server --ip 11.162.217.192 --port 33333
heshun.+ 93073 92972 0 10:46 pts/0 00:00:00 grep --color=auto observer7
[heshun.lxd@sqaobnoxdn011162217192.sa128 /home/heshun.lxd]
$
[heshun.lxd@sqaobnoxdn011162217192.sa128 /home/heshun.lxd]
$kill -9 86129 87058 87097
[heshun.lxd@sqaobnoxdn011162217192.sa128 /home/heshun.lxd]
$ps -ef | grep observer7
heshun.+ 93636 92972 0 10:47 pts/0 00:00:00 grep --color=auto observer7
MySQL [oceanbase]> select * from dba_ob_servers
where svr_ip='11.xxx.xxx.192' and svr_port=33332 \G
*************************** 1. row ***************************
SVR_IP: 11.xxx.xxx.192
SVR_PORT: 33332
ID: 7
ZONE: zone3
SQL_PORT: 33331
WITH_ROOTSERVER: NO
STATUS: INACTIVE
START_SERVICE_TIME: NULL
STOP_TIME: NULL
BLOCK_MIGRATE_IN_TIME: NULL
CREATE_TIME: 2024-08-05 17:22:59.938657
MODIFY_TIME: 2024-08-07 12:10:19.770275
BUILD_VERSION: 4.3.1.0_1-bad90e897a7f0f56b8ce5c43e186aa8f4bab03c4(Jun 28 2024 23:14:03)
LAST_OFFLINE_TIME: 2024-08-07 10:47:46.794891
1 row in set (0.00 sec)
调小永久下线时间
alter system set server_permanent_offline_time = '60s';
MySQL [oceanbase]> select * from DBA_OB_ROOTSERVICE_EVENT_HISTORY
where event like "%permanent_offline%" order by 1 desc limit 1 \G
*************************** 1. row ***************************
TIMESTAMP: 2024-08-07 12:06:14.704733
MODULE: server
EVENT: permanent_offline
NAME1: server
VALUE1: "11.xxx.xxx.192:33332"
NAME2:
VALUE2:
NAME3:
VALUE3:
NAME4:
VALUE4:
NAME5:
VALUE5:
NAME6:
VALUE6:
EXTRA_INFO:
RS_SVR_IP: 11.xxx.xxx.191
RS_SVR_PORT: 12882
1 row in set (0.00 sec)
确认故障节点涉及的租户和zone信息
select a.tenant_name,b.zone
from dba_ob_tenants a,dba_ob_units b
where a.tenant_id=b.tenant_id
and (b.svr_ip='11.xxx.xxx.192' and b.svr_port=33332);
+-------------+-------+
| tenant_name | zone |
+-------------+-------+
| test1 | zone3 |
| test2 | zone3 |
+-------------+-------+
2 rows in set (0.14 sec)
确认选择哪种方式处理
ob从 4.0 版本开始,要求每个租户在各 zone 的 unit_num 必须保持一致,从下面的sql 可以看出来涉及的2个租户的unit_num等于单zone内的节点数。
MySQL [oceanbase]> select tenant_name,tenant_id,locality,unit_num
from dba_ob_tenants where tenant_name in ('test1','test2');
+-------------+-----------+---------------------------------------------+----------+
| tenant_name | tenant_id | locality | unit_num |
+-------------+-----------+---------------------------------------------+----------+
| test1 | 1002 | FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3 | 2 |
| test2 | 1004 | FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3 | 2 |
+-------------+-----------+---------------------------------------------+----------+
2 rows in set (0.08 sec)
这里有2种思路:
调小unit_num
将涉及的租户 unit_num大小减去故障节点数,本次是unit_num从2调整为1。
优先评估磁盘空间是否够用:从下面的sql查看结果来看各zone内的磁盘评估使用率超过50%了,unit_num从2调整为1,单个节点无法容纳当前的数据量
select b.zone,a.svr_ip,a.svr_port,
round(total_size/1024/1024/1024,2) total_size,
round(used_size/1024/1024/1024,2) used_size,
round(free_size/1024/1024/1024,2) free_size,
round(used_size/total_size,2)*100 used_pct
from __all_virtual_disk_stat a, dba_ob_servers b
where a.svr_ip=b.svr_ip
and a.svr_port=b.svr_port
order by zone desc;
+-------+----------------+----------+------------+-----------+-----------+----------+
| zone | svr_ip | svr_port | total_size | used_size | free_size | used_pct |
+-------+----------------+----------+------------+-----------+-----------+----------+
| zone3 | 11.xxx.xxx.192 | 52882 | 10.00 | 7.21 | 2.79 | 72.00 |
| zone2 | 11.xxx.xxx.191 | 32882 | 10.00 | 4.12 | 5.88 | 41.00 |
| zone2 | 11.xxx.xxx.192 | 42882 | 10.00 | 7.95 | 2.04 | 80.00 |
| zone1 | 11.xxx.xxx.191 | 12882 | 10.00 | 7.05 | 2.95 | 71.00 |
| zone1 | 11.xxx.xxx.191 | 22882 | 10.00 | 5.03 | 4.96 | 50.00 |
+-------+----------------+----------+------------+-----------+-----------+----------+
5 rows in set (0.11 sec)
如果在我们实际的环境中磁盘容量满足,在没有ocp express等有效监控评估对应的租户在各节点的cpu/memory/io 使用情况,避免因 unit_num 缩容后造成节点负载高等其他异常。
如果评估unit_num 缩容之后资源够用:
# 1、关闭涉及租户的 enable_transfer
alter system set enable_transfer='false' tenant='test1';
alter system set enable_transfer='false' tenant='test2';
# 2、查询故障节点涉及的租户及对应的 unit_group_id
select a.tenant_name,b.unit_group_id
from dba_ob_tenants a,dba_ob_units b
where a.tenant_id=b.tenant_id
and (svr_ip='11.xxx.xxx.192' and svr_port=33332);
+-------------+---------------+
| tenant_name | unit_group_id |
+-------------+---------------+
| test1 | 1007 |
| test2 | 1009 |
+-------------+---------------+
2 rows in set (0.15 sec)
# 3、缩小租户的unit_num 并删除故障节点对应的unit_group_id
alter resource tenant test1 unit_num=1 DELETE unit_group = ( 1007 );
alter resource tenant test2 unit_num=1 DELETE unit_group = ( 1009 );
等待机器修复好了之后再把unit_num改回来,这样只需要每个租户个数个日志流的两次迁移操作即可。
调小副本数
将涉及的租户的副本从3副本调整为2副本。
1、调整 locality
从每个租户的 locality 里剔除 zone3
MySQL [oceanbase]> alter tenant test1 locality='f@zone1,f@zone2';
Query OK, 0 rows affected (1.33 sec)
MySQL [oceanbase]> alter tenant test2 locality='f@zone1,f@zone2';
Query OK, 0 rows affected (1.17 sec)
# 对应的记录 JOB_STATUS 为success 说明locality变更完成
select * from DBA_OB_TENANT_JOBS
where job_type='alter_tenant_locality';
MySQL [oceanbase]> select * from DBA_OB_TENANT_JOBS where job_type='alter_tenant_locality' \G
*************************** 1. row ***************************
JOB_ID: 14
JOB_TYPE: ALTER_TENANT_LOCALITY
JOB_STATUS: SUCCESS
RESULT_CODE: 0
PROGRESS: 100
START_TIME: 2024-08-07 18:40:08.206965
MODIFY_TIME: 2024-08-07 18:47:12.775644
TENANT_ID: 1002
SQL_TEXT: alter tenant test1 locality='f@zone1,f@zone2'
EXTRA_INFO: FROM: 'FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3', TO: 'FULL{1}@zone1, FULL{1}@zone2'
RS_SVR_IP: 11.xxx.xxx.191
RS_SVR_PORT: 12882
*************************** 2. row ***************************
JOB_ID: 15
JOB_TYPE: ALTER_TENANT_LOCALITY
JOB_STATUS: SUCCESS
RESULT_CODE: 0
PROGRESS: 100
START_TIME: 2024-08-07 18:40:22.269937
MODIFY_TIME: 2024-08-07 18:40:55.905170
TENANT_ID: 1004
SQL_TEXT: alter tenant test2 locality='f@zone1,f@zone2'
EXTRA_INFO: FROM: 'FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3', TO: 'FULL{1}@zone1, FULL{1}@zone2'
RS_SVR_IP: 11.xxx.xxx.191
RS_SVR_PORT: 12882
2 rows in set (0.01 sec)
2、调整 resource_pool_list
从每个租户的 resource_pool_list 里剔除zone3
# 确认涉及租户的在各zone种的 resource pool name
select * from dba_ob_resource_pools where tenant_id in (1002,1004);
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-------------------+--------------+
| RESOURCE_POOL_ID | NAME | TENANT_ID | CREATE_TIME | MODIFY_TIME | UNIT_COUNT | UNIT_CONFIG_ID | ZONE_LIST | REPLICA_TYPE |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-------------------+--------------+
| 1001 | p1 | 1002 | 2024-07-24 17:03:25.453796 | 2024-08-01 10:35:15.150609 | 2 | 1001 | zone1;zone2;zone3 | FULL |
| 1002 | p2_1 | 1004 | 2024-08-06 20:11:07.067122 | 2024-08-07 10:30:55.220775 | 2 | 1002 | zone1 | FULL |
| 1003 | p2_2 | 1004 | 2024-08-06 20:11:28.524490 | 2024-08-07 10:30:56.059745 | 2 | 1002 | zone2 | FULL |
| 1004 | p2_3 | 1004 | 2024-08-06 20:11:29.307956 | 2024-08-07 10:30:56.801430 | 2 | 1002 | zone3 | FULL |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-------------------+--------------+
4 rows in set (0.02 sec)
# 删除 test2 租户 在zone3上的resource_pool,记录删除的为 p2_3
MySQL [oceanbase]> alter tenant test2 resource_pool_list=('p2_1', 'p2_2');
Query OK, 0 rows affected (0.59 sec)
#租户 test1 的 resource pool 在所有的3个 zone 上存在复用的情况,需要先执行split 操作。
MySQL [oceanbase]> alter resource pool p1 SPLIT INTO ('p1_1','p1_2','p1_3') ON ('zone1','zone2','zone3');
Query OK, 0 rows affected (0.10 sec)
# 删除 test1 租户 在zone3上的resource_pool,记录删除的为 p1_3
alter tenant test1 resource_pool_list=('p1_1', 'p1_2');
#删除后确认
MySQL [oceanbase]> select * from dba_ob_resource_pools where tenant_id in (1002,1004);
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
| RESOURCE_POOL_ID | NAME | TENANT_ID | CREATE_TIME | MODIFY_TIME | UNIT_COUNT | UNIT_CONFIG_ID | ZONE_LIST | REPLICA_TYPE |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
| 1002 | p2_1 | 1004 | 2024-08-06 20:11:07.067122 | 2024-08-07 10:30:55.220775 | 2 | 1002 | zone1 | FULL |
| 1003 | p2_2 | 1004 | 2024-08-06 20:11:28.524490 | 2024-08-07 10:30:56.059745 | 2 | 1002 | zone2 | FULL |
| 1005 | p1_1 | 1002 | 2024-08-07 18:57:10.471703 | 2024-08-07 18:57:10.471703 | 2 | 1001 | zone1 | FULL |
| 1006 | p1_2 | 1002 | 2024-08-07 18:57:10.481192 | 2024-08-07 18:57:10.481192 | 2 | 1001 | zone2 | FULL |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
4 rows in set (0.00 sec)
3、删除 resource_pool
删除每个租户在 zone3 里的 resource_pool
MySQL [oceanbase]> select * from dba_ob_resource_pools where tenant_id is null;
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
| RESOURCE_POOL_ID | NAME | TENANT_ID | CREATE_TIME | MODIFY_TIME | UNIT_COUNT | UNIT_CONFIG_ID | ZONE_LIST | REPLICA_TYPE |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
| 1004 | p2_3 | NULL | 2024-08-06 20:11:29.307956 | 2024-08-07 18:53:48.789520 | 2 | 1002 | zone3 | FULL |
| 1007 | p1_3 | NULL | 2024-08-07 18:57:10.488961 | 2024-08-07 18:58:58.294668 | 2 | 1001 | zone3 | FULL |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
2 rows in set (0.01 sec)
MySQL [oceanbase]> drop resource pool p1_3;
Query OK, 0 rows affected (0.02 sec)
MySQL [oceanbase]> drop resource pool p2_3;
Query OK, 0 rows affected (0.01 sec)
MySQL [oceanbase]> select * from dba_ob_resource_pools where tenant_id is null;
Empty set (0.01 sec)
4、删除故障 observer
MySQL [oceanbase]> alter system delete server '11.xxx.xxx.192:33332' zone='zone3';
Query OK, 0 rows affected (0.24 sec)
MySQL [oceanbase]> select * from dba_ob_servers order by zone;
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+--------------------------------------------------------------------------+-------------------+
| SVR_IP | SVR_PORT | ID | ZONE | SQL_PORT | WITH_ROOTSERVER | STATUS | START_SERVICE_TIME | STOP_TIME | BLOCK_MIGRATE_IN_TIME | CREATE_TIME | MODIFY_TIME | BUILD_VERSION | LAST_OFFLINE_TIME |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+--------------------------------------------------------------------------+-------------------+
| 11.xxx.xxx.191 | 12882 | 1 | zone1 | 12881 | YES | ACTIVE | 2024-07-31 08:14:53.119695 | NULL | NULL | 2024-07-24 10:50:42.478616 | 2024-07-31 08:14:55.115175 | 4.3.1.0_1-bad90e897a7f0f56b8ce5c43e186aa8f4bab03c4(Jun 28 2024 23:14:03) | NULL |
| 11.xxx.xxx.191 | 22882 | 4 | zone1 | 22881 | NO | ACTIVE | 2024-07-31 08:14:07.606076 | NULL | NULL | 2024-07-24 10:51:01.744417 | 2024-07-31 08:14:10.278981 | 4.3.1.0_1-bad90e897a7f0f56b8ce5c43e186aa8f4bab03c4(Jun 28 2024 23:14:03) | NULL |
| 11.xxx.xxx.191 | 32882 | 2 | zone2 | 32881 | NO | ACTIVE | 2024-08-05 10:06:37.915423 | NULL | NULL | 2024-07-24 10:50:42.523764 | 2024-08-05 10:06:39.509455 | 4.3.1.0_1-bad90e897a7f0f56b8ce5c43e186aa8f4bab03c4(Jun 28 2024 23:14:03) | NULL |
| 11.xxx.xxx.192 | 42882 | 5 | zone2 | 42881 | NO | ACTIVE | 2024-07-31 08:14:03.193183 | NULL | NULL | 2024-07-24 10:51:01.792832 | 2024-07-31 08:14:06.075483 | 4.3.1.0_1-bad90e897a7f0f56b8ce5c43e186aa8f4bab03c4(Jun 28 2024 23:14:03) | NULL |
| 11.xxx.xxx.192 | 52882 | 3 | zone3 | 52881 | NO | ACTIVE | 2024-07-31 08:14:57.594518 | NULL | NULL | 2024-07-24 10:50:42.562099 | 2024-07-31 08:14:59.182698 | 4.3.1.0_1-bad90e897a7f0f56b8ce5c43e186aa8f4bab03c4(Jun 28 2024 23:14:03) | NULL |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+--------------------------------------------------------------------------+-------------------+
5 rows in set (0.02 sec)
5、修复机器
删除相关目录结构模拟修复后的机器
# home_path: /home/heshun.lxd/observer7
# data_dir: /obdata/data/data7
# redo_dir: /obdata/log/log7
rm -rf /home/heshun.lxd/observer7
rm -rf /obdata/data/data7
rm -rf /obdata/log/log7
6、重新加入集群
由于我们已经在ob集群里删除了 11.xxx.xxx.192:33332 节点,obd的配置文件里还是存在的,现在需要重新通过obd将 11.162.217.192:33332 节点扩容加入到 ob集群里,需要先手动删除:
cd ~/.obd/cluster/${deploy_name}
# 修改前备份
cp -ap config.yaml bak_config.yaml
cp -ap inner_config.yaml bak_inner_config.yaml
# 分别删除已经从ob集群里删除到节点 11.xxx.xxx.192:33332 和对应的配置
vim config.yaml
vim inner_config.yaml
在故障节点所在的 zone 上添加新节点的配置文件 add_zone3_server7.yaml
oceanbase-ce:
servers:
- name: server7
ip: 11.xxx.xxx.192
server7:
obshell_port: 33333
mysql_port: 33331
rpc_port: 33332
home_path: /home/heshun.lxd/observer7
data_dir: /obdata/data/data7
redo_dir: /obdata/log/log7
zone: zone3
将修复后的机器添扩容加入到ob集群
obd cluster scale_out ob431 -c add_zone3_server7.yaml -v
7、给所有租户添加副本
在增加副本之前,首先需要确认租户在目标 zone 上是否有资源池
MySQL [oceanbase]> select * from dba_ob_resource_pools;
+------------------+----------+-----------+----------------------------+----------------------------+------------+----------------+-------------------+--------------+
| RESOURCE_POOL_ID | NAME | TENANT_ID | CREATE_TIME | MODIFY_TIME | UNIT_COUNT | UNIT_CONFIG_ID | ZONE_LIST | REPLICA_TYPE |
+------------------+----------+-----------+----------------------------+----------------------------+------------+----------------+-------------------+--------------+
| 1 | sys_pool | 1 | 2024-07-24 10:50:37.618503 | 2024-07-24 10:50:37.671789 | 1 | 1 | zone1;zone2;zone3 | FULL |
| 1002 | p2_1 | 1004 | 2024-08-06 20:11:07.067122 | 2024-08-07 10:30:55.220775 | 2 | 1002 | zone1 | FULL |
| 1003 | p2_2 | 1004 | 2024-08-06 20:11:28.524490 | 2024-08-07 10:30:56.059745 | 2 | 1002 | zone2 | FULL |
| 1005 | p1_1 | 1002 | 2024-08-07 18:57:10.471703 | 2024-08-07 18:57:10.471703 | 2 | 1001 | zone1 | FULL |
| 1006 | p1_2 | 1002 | 2024-08-07 18:57:10.481192 | 2024-08-07 18:57:10.481192 | 2 | 1001 | zone2 | FULL |
+------------------+----------+-----------+----------------------------+----------------------------+------------+----------------+-------------------+--------------+
5 rows in set (0.02 sec)
## 确认各 resource pool 使用的规格名称
MySQL [oceanbase]> select * from dba_ob_unit_configs;
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+---------------------+---------------------+-------------+
| UNIT_CONFIG_ID | NAME | CREATE_TIME | MODIFY_TIME | MAX_CPU | MIN_CPU | MEMORY_SIZE | LOG_DISK_SIZE | MAX_IOPS | MIN_IOPS | IOPS_WEIGHT |
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+---------------------+---------------------+-------------+
| 1 | sys_unit_config | 2024-07-24 10:50:37.599549 | 2024-07-24 10:50:37.599549 | 2 | 2 | 2147483648 | 3221225472 | 9223372036854775807 | 9223372036854775807 | 2 |
| 1001 | u1 | 2024-07-24 17:03:25.421358 | 2024-08-06 20:08:01.205235 | 3 | 3 | 9663676416 | 25769803776 | 10000 | 10000 | 0 |
| 1002 | u2 | 2024-08-06 20:09:21.183164 | 2024-08-06 20:09:21.183164 | 2 | 2 | 2147483648 | 6442450944 | 9223372036854775807 | 9223372036854775807 | 2 |
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+---------------------+---------------------+-------------+
3 rows in set (0.02 sec)
由于我们将 test1 租户(1001)和 test2 租户(1002) 分别在 zone3 上删除了对应的 resource pool,因此我们需要给 zone3 上给上述的租户拥有对应的 resource pool,可以有2种方式:
- 调整现有资源池的 zone 列表,该方法适用于租户在所有zone使用同一个 resource pool name, 在缩小副本数->调整 resource_pool_list的步骤中因无法使用同一个resource pool name,当时已经执行了split操作,所以这里不使用该方法。
- 新增一个资源池给租户,要求资源池的 unit_num 与租户已有的资源池一致。
创建 resource pool
# 给 test1 租户在zone上创建 resource pool
create resource pool p1_3 unit='u1' ,unit_num=2,zone_list=('zone3');
# 给 test2 租户在zone上创建 resource pool
create resource pool p2_3 unit='u2' ,unit_num=2,zone_list=('zone3');
# 最后2条记录,tenant_id 等于NULL的说明刚新建的resource pool还没有关联给任何租户使用。
MySQL [oceanbase]> select * from dba_ob_resource_pools;
+------------------+----------+-----------+----------------------------+----------------------------+------------+----------------+-------------------+--------------+
| RESOURCE_POOL_ID | NAME | TENANT_ID | CREATE_TIME | MODIFY_TIME | UNIT_COUNT | UNIT_CONFIG_ID | ZONE_LIST | REPLICA_TYPE |
+------------------+----------+-----------+----------------------------+----------------------------+------------+----------------+-------------------+--------------+
| 1 | sys_pool | 1 | 2024-07-24 10:50:37.618503 | 2024-07-24 10:50:37.671789 | 1 | 1 | zone1;zone2;zone3 | FULL |
| 1002 | p2_1 | 1004 | 2024-08-06 20:11:07.067122 | 2024-08-07 10:30:55.220775 | 2 | 1002 | zone1 | FULL |
| 1003 | p2_2 | 1004 | 2024-08-06 20:11:28.524490 | 2024-08-07 10:30:56.059745 | 2 | 1002 | zone2 | FULL |
| 1005 | p1_1 | 1002 | 2024-08-07 18:57:10.471703 | 2024-08-07 18:57:10.471703 | 2 | 1001 | zone1 | FULL |
| 1006 | p1_2 | 1002 | 2024-08-07 18:57:10.481192 | 2024-08-07 18:57:10.481192 | 2 | 1001 | zone2 | FULL |
| 1008 | p1_3 | NULL | 2024-08-08 11:07:34.387838 | 2024-08-08 11:07:34.387838 | 2 | 1001 | zone3 | FULL |
| 1010 | p2_3 | NULL | 2024-08-08 11:12:49.406296 | 2024-08-08 11:12:49.406296 | 2 | 1002 | zone3 | FULL |
+------------------+----------+-----------+----------------------------+----------------------------+------------+----------------+-------------------+--------------+
7 rows in set (0.01 sec)
修改租户的 resource_pool_list
MySQL [oceanbase]> alter tenant test1 resource_pool_list=('p1_1','p1_2','p1_3');
Query OK, 0 rows affected (2.34 sec)
MySQL [oceanbase]> alter tenant test2 resource_pool_list=('p2_1','p2_2','p2_3');
Query OK, 0 rows affected (1.29 sec)
MySQL [oceanbase]> select * from dba_ob_resource_pools;
+------------------+----------+-----------+----------------------------+----------------------------+------------+----------------+-------------------+--------------+
| RESOURCE_POOL_ID | NAME | TENANT_ID | CREATE_TIME | MODIFY_TIME | UNIT_COUNT | UNIT_CONFIG_ID | ZONE_LIST | REPLICA_TYPE |
+------------------+----------+-----------+----------------------------+----------------------------+------------+----------------+-------------------+--------------+
| 1 | sys_pool | 1 | 2024-07-24 10:50:37.618503 | 2024-07-24 10:50:37.671789 | 1 | 1 | zone1;zone2;zone3 | FULL |
| 1002 | p2_1 | 1004 | 2024-08-06 20:11:07.067122 | 2024-08-07 10:30:55.220775 | 2 | 1002 | zone1 | FULL |
| 1003 | p2_2 | 1004 | 2024-08-06 20:11:28.524490 | 2024-08-07 10:30:56.059745 | 2 | 1002 | zone2 | FULL |
| 1005 | p1_1 | 1002 | 2024-08-07 18:57:10.471703 | 2024-08-07 18:57:10.471703 | 2 | 1001 | zone1 | FULL |
| 1006 | p1_2 | 1002 | 2024-08-07 18:57:10.481192 | 2024-08-07 18:57:10.481192 | 2 | 1001 | zone2 | FULL |
| 1008 | p1_3 | 1002 | 2024-08-08 11:07:34.387838 | 2024-08-08 11:44:49.982360 | 2 | 1001 | zone3 | FULL |
| 1010 | p2_3 | 1004 | 2024-08-08 11:12:49.406296 | 2024-08-08 11:45:14.888533 | 2 | 1002 | zone3 | FULL |
+------------------+----------+-----------+----------------------------+----------------------------+------------+----------------+-------------------+--------------+
7 rows in set (0.00 sec)
修改租户的 locality
MySQL [oceanbase]> alter tenant test1 locality='f@zone1,f@zone2,f@zone3';
Query OK, 0 rows affected (0.59 sec)
MySQL [oceanbase]> alter tenant test2 locality='f@zone1,f@zone2,f@zone3';
Query OK, 0 rows affected (0.45 sec)
# 等 JOB_STATUS 是 SUCCESS 说明修改租户的locality操作执行成功了
select * from dba_ob_tenant_jobs
where job_type='alter_tenant_locality' order by start_time desc limit 2 \G
*************************** 1. row ***************************
JOB_ID: 18
JOB_TYPE: ALTER_TENANT_LOCALITY
JOB_STATUS: SUCCESS
RESULT_CODE: 0
PROGRESS: 100
START_TIME: 2024-08-08 11:48:56.763835
MODIFY_TIME: 2024-08-08 11:53:19.692981
TENANT_ID: 1004
SQL_TEXT: alter tenant test2 locality='f@zone1,f@zone2,f@zone3'
EXTRA_INFO: FROM: 'FULL{1}@zone1, FULL{1}@zone2', TO: 'FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3'
RS_SVR_IP: 11.xxx.xxx.191
RS_SVR_PORT: 12882
*************************** 2. row ***************************
JOB_ID: 17
JOB_TYPE: ALTER_TENANT_LOCALITY
JOB_STATUS: SUCCESS
RESULT_CODE: 0
PROGRESS: 100
START_TIME: 2024-08-08 11:48:49.527044
MODIFY_TIME: 2024-08-08 11:53:19.129475
TENANT_ID: 1002
SQL_TEXT: alter tenant test1 locality='f@zone1,f@zone2,f@zone3'
EXTRA_INFO: FROM: 'FULL{1}@zone1, FULL{1}@zone2', TO: 'FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3'
RS_SVR_IP: 11.xxx.xxx.191
RS_SVR_PORT: 12882
2 rows in set (0.00 sec)
到这里,没有通过替换节点的方式将故障的节点处理后,重新再加入到ob集群。