本节主要分享 OB 3.2 下的租户副本扩容运维过程,这依然是 OB 弹性伸缩最有趣的功能。
问题现象
租户的扩容既可以通过扩大副本的资源单元规格,也可以通过扩大副本数来实现。
这里目的是将租户从单副本扩容到三副本。
默认的租户资源规格的内存都比较大。 需要自定义一个小内存的资源规格。
但是 OCP 这个页面显然是有问题的,新增过的资源单元规格没有展示出来。
反复重试多次都是这样。
问题解决
直接采取在命令行下手动对租户扩容。
- 查看集群可用资源
select a.zone,concat(a.svr_ip,':',a.svr_port) observer, cpu_total, (cpu_total-cpu_assigned) cpu_free
, round(mem_total/1024/1024/1024) mem_total_gb, round((mem_total-mem_assigned)/1024/1024/1024,2) mem_free_gb
, round(disk_total/1024/1024/1024) disk_total_gb, round((disk_total-disk_assigned)/1024/1024/1024) disk_free_gb
from __all_virtual_server_stat a join __all_server b on (a.svr_ip=b.svr_ip and a.svr_port=b.svr_port)
order by a.zone, a.svr_ip
;
zone | observer | cpu_total | cpu_free | mem_total_gb | mem_free_gb | disk_total_gb | disk_free_gb |
---|---|---|---|---|---|---|---|
zone1 | 10.0.0.61:2882 | 14 | 2 | 8 | 0 | 100 | -1,000 |
ZONE2 | 10.0.0.62:2882 | 14 | 14 | 8 | 8 | 100 | 100 |
ZONE3 | 10.0.0.63:2882 | 14 | 14 | 8 | 8 | 100 | 100 |
- 查看现有租户资源分布。
select t1.name resource_pool_name, t2.`name` unit_config_name, t2.max_cpu, t2.min_cpu
, round(t2.max_memory/1024/1024/1024) max_mem_gb, round(t2.min_memory/1024/1024/1024) min_mem_gb
, round(t2.max_disk_size/1024/1024/1024) max_disk_size , t4.tenant_name
from __all_resource_pool t1 join __all_unit_config t2 on (t1.unit_config_id=t2.unit_config_id)
join __all_unit t3 on (t1.`resource_pool_id` = t3.`resource_pool_id`)
left join __all_tenant t4 on (t1.tenant_id=t4.tenant_id)
order by t1.`resource_pool_id`, t2.`unit_config_id`, t3.unit_id
;
resource_pool_name | unit_config_name | max_cpu | min_cpu | max_mem_gb | min_mem_gb | max_disk_size | tenant_name |
---|---|---|---|---|---|---|---|
sys_pool | sys_unit_config | 5 | 5 | 4 | 4 | 100 | sys |
pool_oboracle_zone1_neq | config_oboracle_zone1_S1_neq | 4 | 4 | 1 | 1 | 500 | oboracle |
pool_obmysql_zone1_uvw | config_obmysql_zone1_S1_uvw | 3 | 3 | 3 | 3 | 500 | obmysql |
- 在新节点上创建资源单元
create resource unit unit_3c3g max_cpu=3,min_cpu=3,max_memory='3G',min_memory='3G',max_iops=10000,min_iops=1000,max_session_num=100000,max_disk_size='200G';
create resource pool pool_obmysql_zone2 unit='unit_3c3g' , unit_num=1, zone_list=('ZONE2');
create resource pool pool_obmysql_zone3 unit='unit_3c3g' , unit_num=1, zone_list=('ZONE3');
- 租户 obmysql 扩大资源池
注意,租户资源池的扩容必须一个个的加。
ALTER tenant obmysql resource_pool_list=('pool_obmysql_zone1_uvw','pool_obmysql_zone2');
ALTER tenant obmysql resource_pool_list=('pool_obmysql_zone1_uvw','pool_obmysql_zone2','pool_obmysql_zone3');
查看集群最近的变更事件。
select gmt_create , module , event, name1 , value1 , name2, value2, name3, value3 , rs_svr_ip
from `__all_rootservice_event_history` areh
WHERE 1=1
order by gmt_create desc limit 10;
gmt_create | module | event | name1 | value1 | name2 | value2 | name3 | value3 | rs_svr_ip |
---|---|---|---|---|---|---|---|---|---|
2023-06-08 13:10:11.162 | root_service | create_resource_pool | ret | 0 | arg | pool_name:pool_obmysql_zone3, unit:unit_3c3g, unit_num:1, zone_list:[ZONE3], replica_type:0, if_not_exist:false, is_tenant_sys_pool:false | 10.0.0.61 | ||
2023-06-08 13:10:11.162 | unit | create_resource_pool | name | pool_obmysql_zone3 | unit | unit_3c3g | zone_list | [ZONE3] | 10.0.0.61 |
2023-06-08 13:10:11.161 | unit | create_unit | unit_id | 1005 | server | 10.0.0.63:2882 | 10.0.0.61 | ||
2023-06-08 13:10:08.455 | root_service | create_resource_pool | ret | 0 | arg | pool_name:pool_obmysql_zone2, unit:unit_3c3g, unit_num:1, zone_list:[ZONE2], replica_type:0, if_not_exist:false, is_tenant_sys_pool:false | 10.0.0.61 | ||
2023-06-08 13:10:08.455 | unit | create_resource_pool | name | pool_obmysql_zone2 | unit | unit_3c3g | zone_list | [ZONE2] | 10.0.0.61 |
2023-06-08 13:10:08.453 | unit | create_unit | unit_id | 1004 | server | 10.0.0.62:2882 | 10.0.0.61 | ||
2023-06-08 13:08:47.096 | root_service | create_resource_unit | ret | 0 | arg | unit_name:unit_3c3g, min_cpu:3.000000000000000000e+00, min_iops:1000, min_memory:3221225472, max_cpu:3.000000000000000000e+00, max_memory:3221225472, max_iops:10000, max_disk_size:214748364800, max_session_num:100000, if_not_exist:false | 10.0.0.61 | ||
2023-06-08 13:08:47.095 | unit | create_resource_unit | name | unit_3c3g | 10.0.0.61 | ||||
2023-06-08 12:53:07.465 | root_service | alter_resource_unit | ret | 0 | arg | unit_name:config_obmysql_zone1_S1_uvw, min_cpu:3.000000000000000000e+00, min_iops:0, min_memory:0, max_cpu:3.000000000000000000e+00, max_memory:0, max_iops:0, max_disk_size:0, max_session_num:0 | 10.0.0.61 | ||
2023-06-08 12:53:07.465 | unit | alter_resource_unit | name | config_obmysql_zone1_S1_uvw | old_config | {unit_config_id:1002, name:config_obmysql_zone1_S1_uvw, max_cpu:3.000000000000000000e+00, min_cpu:3.000000000000000000e+00, max_memory:3221225472, min_memory:3221225472, max_disk_size:536870912000, max_iops:1250, min_iops:1250, max_session_num:375} | new_config | {unit_config_id:1002, name:config_obmysql_zone1_S1_uvw, max_cpu:3.000000000000000000e+00, min_cpu:3.000000000000000000e+00, max_memory:3221225472, min_memory:3221225472, max_disk_size:536870912000, max_iops:1250, min_iops:1250, max_session_num:375} | 10.0.0.61 |
查看 OCP 里租户资源池的变化。
此时还只是租户资源池发生变化,租户的数据分布并没有变,副本数也没有变。
- 扩容租户的副本数
ALTER tenant obmysql locality='FULL{1}@zone1, FULL{1}@ZONE2';
此时,租户开始从单副本扩容到 2 副本,查看集群最近的事件,可以看到有很多的增加分区的事件(start_add_replica
和 finish_add_replica
。
只有扩容完毕后才能继续扩容到 3 副本。否则会报错:SQL 错误 [4179] [HY000]: alter tenant locality when previous operation is in progress not allowed
。
查看扩容结果。
SELECT job_id, gmt_create ,gmt_modified ,job_type ,job_status ,return_code ,progress ,tenant_id ,tenant_name ,rs_svr_ip ,sql_text FROM `__all_rootservice_job` arj ;
job_id | gmt_create | gmt_modified | job_type | job_status | return_code | progress | tenant_id | tenant_name | rs_svr_ip | sql_text |
---|---|---|---|---|---|---|---|---|---|---|
1 | 2023-06-08 13:19:07.838 | 2023-06-08 13:22:02.912 | ALTER_TENANT_LOCALITY | SUCCESS | 0 | 100 | 1,002 | obmysql | 10.0.0.61 | /* ApplicationName=DBeaver 23.1.0 - SQLEditor <Script-3.sql> */ ALTER tenant obmysql locality=‘FULL{1}@zone1, FULL{1}@ZONE2’ |
- 继续从 2副本 扩容到 3副本。
ALTER tenant obmysql locality='FULL{1}@zone1, FULL{1}@ZONE2, FULL{1}@ZONE3';
报错了:SQL 错误 [1235] [0A000]: tenant primary zone span regions when GTS is on not supported
这个原因是默认建的租户 obmysql 的 PRIMARY_ZONE 是 RANDOM 。设置为 ZONE1 即可。
ALTER tenant obmysql PRIMARY_ZONE 'zone1';
继续执行,成功。
job_id | gmt_create | gmt_modified | job_type | job_status | return_code | progress | tenant_id | tenant_name | rs_svr_ip | sql_text |
---|---|---|---|---|---|---|---|---|---|---|
1 | 2023-06-08 13:19:07.838 | 2023-06-08 13:22:02.912 | ALTER_TENANT_LOCALITY | SUCCESS | 0 | 100 | 1,002 | obmysql | 10.0.0.61 | /* ApplicationName=DBeaver 23.1.0 - SQLEditor <Script-3.sql> */ ALTER tenant obmysql locality=‘FULL{1}@zone1, FULL{1}@ZONE2’ |
2 | 2023-06-08 13:32:07.008 | 2023-06-08 13:35:14.477 | ALTER_TENANT_LOCALITY | SUCCESS | 0 | 100 | 1,002 | obmysql | 10.0.0.61 | /* ApplicationName=DBeaver 23.1.0 - SQLEditor <Script-3.sql> */ ALTER tenant obmysql locality=‘FULL{1}@zone1, FULL{1}@ZONE2, FULL{1}@ZONE3’ |
- 查看租户表分区分布
SELECT
t2.zone, ROLE,
concat(t2.svr_ip, ':', t2.svr_port) observer,
count(*) cnt, round(sum(t2.data_size) / 1024 / 1024) data_size_mb
FROM
gv$table t1
JOIN gv$partition t2 ON (t1.tenant_id = t2.tenant_id AND t1.table_id = t2.table_id)
LEFT JOIN __all_virtual_tablegroup t3 ON (t1.tenant_id = t3.tenant_id AND t1.tablegroup_id = t3.tablegroup_id)
JOIN gv$database t4 ON (t1.tenant_Id = t4.tenant_id AND t1.database_id = t4.database_id)
JOIN gv$tenant t5 ON (t1.tenant_id = t5.tenant_id)
WHERE
t5.tenant_id = 1002
AND t4.database_name = 'TPCCDB'
GROUP BY ZONE, observer, role
ORDER BY ZONE, observer;
zone | ROLE | observer | cnt | data_size_mb |
---|---|---|---|---|
zone1 | 1 | 10.0.0.61:2882 | 199 | 4,121 |
ZONE2 | 2 | 10.0.0.62:2882 | 199 | 4,121 |
ZONE3 | 2 | 10.0.0.63:2882 | 199 | 4,121 |
调整一下租户 primary zone 的分布策略,再看看分区的分布。
zone | ROLE | observer | cnt | data_size_mb |
---|---|---|---|---|
zone1 | 1 | 10.0.0.61:2882 | 96 | 305 |
zone1 | 2 | 10.0.0.61:2882 | 103 | 3,815 |
ZONE2 | 1 | 10.0.0.62:2882 | 103 | 3,815 |
ZONE2 | 2 | 10.0.0.62:2882 | 96 | 305 |
ZONE3 | 2 | 10.0.0.63:2882 | 199 | 4,121 |
总结
租户扩容副本(单副本到三副本)主要分为租户资源池的扩容以及租户副本 locality 的扩容。租户内的表自动从单副本扩容到三副本。
如果是从三副本扩容到五副本,还需要对显示的将业务表的副本数从三副本扩容到五副本。详情可以参考: OceanBase 独立部署高级玩法三:副本数调整 (qq.com)