【SOP 系列 08】分配资源报错时如何准确统计可用(剩余)资源

我们在给一个已有的 unit 扩容规格,或者新建一个租户,在创建资源池的时候明明根据 __all_virtual_server_stat 统计出对应资源的 free 数值满足我们当前的需求,但是为什么还是会报错,希望通过这篇文章能帮助大家解决如何准确的统计可用(剩余)的资源。

说明:本文主要是以 memory 为例,同样适用于 cpu 的资源。

1、查看每个 OBServer 上总共能分配给所有租户的 memory 大小

show parameters where name in ('memory_limit','memory_limit_percentage','system_memory');

由于 show parameters 命令查看不能输出指定的列,为了方便查看,这里查询虚拟表 __all_virtual_sys_parameter_stat

Q1:

select zone,svr_ip,svr_port,name,value 
from __all_virtual_sys_parameter_stat 
where name in ('memory_limit','memory_limit_percentage','system_memory')
order by svr_ip,svr_port;

+-------+----------------+----------+-------------------------+-------+
| zone  | svr_ip         | svr_port | name                    | value |
+-------+----------------+----------+-------------------------+-------+
| zone1 | 172.30.199.123 |    12882 | memory_limit_percentage | 80    |
| zone1 | 172.30.199.123 |    12882 | system_memory           | 4G    |
| zone1 | 172.30.199.123 |    12882 | memory_limit            | 12g   |
| zone2 | 172.30.199.124 |    12882 | memory_limit_percentage | 80    |
| zone2 | 172.30.199.124 |    12882 | system_memory           | 4G    |
| zone2 | 172.30.199.124 |    12882 | memory_limit            | 12g   |
| zone3 | 172.30.199.125 |    12882 | memory_limit_percentage | 80    |
| zone3 | 172.30.199.125 |    12882 | system_memory           | 4G    |
| zone3 | 172.30.199.125 |    12882 | memory_limit            | 12g   |
+-------+----------------+----------+-------------------------+-------+
9 rows in set (0.01 sec)

从上面我们可以出每个 OBServer 节点的 memory_limit=12gsystem_memory=4g,所以每个 OBServer 可以分配给租户使用的总内存是 12g-4g=8g,跟 Q3 查询中的 memory_total 可以对应起来。

2、查看租户在每个 OBServer 节点上 unit 的分布

Q2:

select unit_config_name,resource_pool_name,zone,tenant_name,svr_ip,svr_port,max_cpu,min_cpu,
round(max_memory/1024/1024/1024,2) max_memory_gb,round(min_memory/1024/1024/1024,2) min_memory_gb 
from gv$unit order by tenant_name;

+------------------+--------------------+-------+---------------+----------------+----------+---------+---------+---------------+---------------+
| unit_config_name | resource_pool_name | zone  | tenant_name   | svr_ip         | svr_port | max_cpu | min_cpu | max_memory_gb | min_memory_gb |
+------------------+--------------------+-------+---------------+----------------+----------+---------+---------+---------------+---------------+
| sys_unit_config  | sys_pool           | zone1 | sys           | 172.30.199.123 |    12882 |       2 |       2 |          2.00 |          2.00 |
| sys_unit_config  | sys_pool           | zone2 | sys           | 172.30.199.124 |    12882 |       2 |       2 |          2.00 |          2.00 |
| sys_unit_config  | sys_pool           | zone3 | sys           | 172.30.199.125 |    12882 |       2 |       2 |          2.00 |          2.00 |
| u1               | pool_1             | zone1 | test_tenant_1 | 172.30.199.123 |    12882 |       1 |       1 |          4.00 |          1.00 |
| u1               | pool_1             | zone2 | test_tenant_1 | 172.30.199.124 |    12882 |       1 |       1 |          4.00 |          1.00 |
| u1               | pool_1             | zone3 | test_tenant_1 | 172.30.199.125 |    12882 |       1 |       1 |          4.00 |          1.00 |
+------------------+--------------------+-------+---------------+----------------+----------+---------+---------+---------------+---------------+
6 rows in set (0.00 sec)

3、查看每个 OBServer 节点上总共/已分配/剩余资源情况

以下 SQL 中统计每个 OBServer 上剩余可供分配的内存使用的计算逻辑是:memory_total - max_memory,换句话说就是当前节点已分配出去了 max_memory

Q3:

 select u.zone,u.svr_ip,u.svr_port,
        max(s.cpu_total),
        sum(u.max_cpu) as assigned_max_cpu,
        max(s.cpu_total)-sum(u.max_cpu) as free_cpu,
        round(max(s.mem_total/1024/1024/1024),2) as memory_total,
        round(sum(u.max_memory/1024/1024/1024),2) as assigned_max_memory,
        round(max(s.mem_total/1024/1024/1024),2)-round(sum(u.max_memory/1024/1024/1024),2) as free_memory
 from oceanbase.gv$unit u, oceanbase.__all_virtual_server_stat s
 where s.svr_ip=u.svr_ip and s.svr_port=u.svr_port
 group by u.svr_ip,u.svr_port
 order by zone;

+-------+----------------+----------+------------------+------------------+----------+--------------+---------------------+-------------+
| zone  | svr_ip         | svr_port | max(s.cpu_total) | assigned_max_cpu | free_cpu | memory_total | assigned_max_memory | free_memory |
+-------+----------------+----------+------------------+------------------+----------+--------------+---------------------+-------------+
| zone1 | 172.30.199.123 |    12882 |                6 |                3 |        3 |         8.00 |                6.00 |        2.00 |
| zone2 | 172.30.199.124 |    12882 |                6 |                3 |        3 |         8.00 |                6.00 |        2.00 |
| zone3 | 172.30.199.125 |    12882 |                6 |                3 |        3 |         8.00 |                6.00 |        2.00 |
+-------+----------------+----------+------------------+------------------+----------+--------------+---------------------+-------------+
3 rows in set (0.00 sec)

4、 查看每个 OBServer 节点上总共/已分配/剩余资源情况

可能会有疑问,为什么第 4 小节的标题跟第 3 小节的标题是一样的,其主要是查询功能相同,但计算的逻辑不同。

以下 SQL 中统计每个 OBServer上 剩余可供分配的内存使用的计算逻辑是 memory_total - mem_assigned,这里我们看不出是如何计算出来已分配的,但是我们可以结合 Q3 中的每台机器上 sum(gv$unit.min_memory),就不难推测目前 OceanBase 中计算出来每台 OBServer 已分配出去的内存其实就是 sum(gv$unit.min_memory)

Q4:

select a.zone,concat(a.svr_ip,':',a.svr_port) observer, cpu_total,cpu_assigned,(cpu_total-cpu_assigned) cpu_free, 
round(mem_total/1024/1024/1024) mem_total_gb, round(mem_assigned/1024/1024/1024,2) mem_assigned_gb,
round((mem_total-mem_assigned)/1024/1024/1024) mem_free_gb
from __all_virtual_server_stat a join __all_server b on (a.svr_ip=b.svr_ip and a.svr_port=b.svr_port)
order by a.zone, a.svr_ip;

+-------+----------------------+-----------+--------------+----------+--------------+-----------------+-------------+
| zone  | observer             | cpu_total | cpu_assigned | cpu_free | mem_total_gb | mem_assigned_gb | mem_free_gb |
+-------+----------------------+-----------+--------------+----------+--------------+-----------------+-------------+
| zone1 | 172.30.199.123:12882 |         6 |            3 |        3 |            8 |            3.00 |           5 |
| zone2 | 172.30.199.124:12882 |         6 |            3 |        3 |            8 |            3.00 |           5 |
| zone3 | 172.30.199.125:12882 |         6 |            3 |        3 |            8 |            3.00 |           5 |
+-------+----------------------+-----------+--------------+----------+--------------+-----------------+-------------+
3 rows in set (0.00 sec)

5、验证计算逻辑

Q3 和 Q4 都可以查看已分配/剩余资源情况,到底哪种方式是准确的呢?或者说后续分配资源的时候又是使用的哪个值作为已分配出去的内存呢?接下来我们验证一下:

验证方法1:反证法

假设 Q4 的 SQL 统计是准确的,那么我们构造如下的扩容需求:

根据 Q2 SQL 查询中可以看出扩容前 unit u1 的 min_memory=1gmax_memory=4g,再根据 Q3 中 memory_free = 5g

那么我们可以将 unit u1 的规格扩容成 min_memory = 6gmax_memory = 9g(给原来已分配的 min_memory/max_memory 同时加上 memory_free 值)。

此时跟 Q1 中的结论是相悖的(Q4 中的 max_memory = 9g > Q1 中 memory_limit - system_memory = 8g)。

验证方法2:模拟报错

mysql> alter resource unit u1 min_memory='6G',max_memory='9G';
ERROR 4624 (HY000):  machine resource '"172.30.199.123:12882"MAX_MEM' is not enough to hold a new unit

这里我们在报错后,如何确定日志在哪台机器上,在 OceanBase 中资源管理是由 rootservice 来负责的,所以我们需要找到当前 OceanBase 集群中 rootserver 服务的节点。

select svr_ip,svr_port from oceanbase.__all_server where with_rootserver<>0;
+----------------+----------+
| svr_ip         | svr_port |
+----------------+----------+
| 172.30.199.123 |    12882 |
+----------------+----------+
1 row in set (0.00 sec)

扩容 unit 规格或者其他扩容的命令可能重复执行了多次,在日志中会有多条记录,这里取最后一条记录。

[admin@172.30.199.123 log]$grep "ret=\-4624" rootservice.log | tail -n1
[2022-04-10 11:26:10.648281] WARN  [RS] process_ (ob_rs_rpc_processor.h:182) [2133][452][Y3252AC1EC77D-0005DC0F2756221D] [lt=7] [dc=0] process failed(ret=-4624)

接下来根据 traceid Y3252AC1EC77D-0005DC0F2756221D 来过滤。

[admin@172.30.199.123 log]$grep "Y3252AC1EC77D-0005DC0F2756221D" rootservice.log
[2022-04-10 11:26:10.648197] WARN  [RS] check_expand_config (ob_unit_manager.cpp:7192) [2133][452][Y3252AC1EC77D-0005DC0F2756221D] [lt=5] [dc=0] find server can't hold expanded config(server="172.30.199.123:12882", status=id:1, zone:"zone1", build_version:"3.1.3_10000292022032916-3d79cacb37012cf61b7cb8faf00d9a6bb152bcd1(Mar 29 2022 08:20:39)", server:"172.30.199.123:12882", sql_port:12881, register_time:0, last_hb_time:1649561170195540, block_migrate_in_time:0, stop_time:0, start_service_time:1649332409722013, last_offline_time:0, last_server_behind_time:-4, last_round_trip_time:140, admin_status:"NORMAL", hb_status:"alive", with_rootserver:true, with_partition:true, resource_info:{cpu:"6.000000000000000000e+00", mem_in_use:0, mem_total:8589934592, disk_in_use:425721856, disk_total:20031995904, partition_cnt:1324, report_cpu_assigned:"3.000000000000000000e+00", report_cpu_max_assigned:"3.000000000000000000e+00", report_mem_assigned:3221225472, report_mem_max_assigned:6442450944}, leader_cnt:1200, server_report_status:0, lease_expire_time:1649332469530678, ssl_key_expired_time:0, in_recovery_for_takenover_by_rs:false, expand_config={unit_config_id:18446744073709551615, name:"", max_cpu:"0.000000000000000000e+00", min_cpu:"0.000000000000000000e+00", max_memory:5368709120, min_memory:5368709120, max_disk_size:0, max_iops:0, min_iops:0, max_session_num:0})
[2022-04-10 11:26:10.648226] WARN  check_expand_config (ob_unit_manager.cpp:7082) [2133][452][Y3252AC1EC77D-0005DC0F2756221D] [lt=12] [dc=0]  machine resource '"172.30.199.123:12882"MAX_MEM' is not enough to hold a new unit
[2022-04-10 11:26:10.648234] WARN  [RS] check_expand_config (ob_unit_manager.cpp:7087) [2133][452][Y3252AC1EC77D-0005DC0F2756221D] [lt=5] [dc=0] server doesn't have enough resource to hold expanded config(server="172.30.199.123:12882", expand_config={unit_config_id:18446744073709551615, name:"", max_cpu:"0.000000000000000000e+00", min_cpu:"0.000000000000000000e+00", max_memory:5368709120, min_memory:5368709120, max_disk_size:0, max_iops:0, min_iops:0, max_session_num:0}, ret=-4624)
[2022-04-10 11:26:10.648244] WARN  [RS] alter_unit_config (ob_unit_manager.cpp:752) [2133][452][Y3252AC1EC77D-0005DC0F2756221D] [lt=3] [dc=0] check expand config failed(old config={unit_config_id:1001, name:"u1", max_cpu:"1.000000000000000000e+00", min_cpu:"1.000000000000000000e+00", max_memory:4294967296, min_memory:1073741824, max_disk_size:10737418240, max_iops:1000, min_iops:1000, max_session_num:100}, new config={unit_config_id:18446744073709551615, name:"u1", max_cpu:"1.000000000000000000e+00", min_cpu:"1.000000000000000000e+00", max_memory:9663676416, min_memory:6442450944, max_disk_size:10737418240, max_iops:1000, min_iops:1000, max_session_num:100}, ret=-4624)
[2022-04-10 11:26:10.648252] WARN  [RS] alter_resource_unit (ob_root_service.cpp:3405) [2133][452][Y3252AC1EC77D-0005DC0F2756221D] [lt=5] [dc=0] alter_unit_config failed(unit_config={unit_config_id:18446744073709551615, name:"u1", max_cpu:"1.000000000000000000e+00", min_cpu:"1.000000000000000000e+00", max_memory:9663676416, min_memory:6442450944, max_disk_size:10737418240, max_iops:1000, min_iops:1000, max_session_num:100}, ret=-4624)
[2022-04-10 11:26:10.648281] WARN  [RS] process_ (ob_rs_rpc_processor.h:182) [2133][452][Y3252AC1EC77D-0005DC0F2756221D] [lt=7] [dc=0] process failed(ret=-4624)

从日志里我们可以看出 mem_total = 8589934592report_mem_max_assigned = 6442450944,那么剩余可供扩容的资源应该是 2g,跟 Q3 的结论是一致的。

6、结论

综上,我们在遇到分配资源或者扩容资源时候,避免统计出剩余资源跟和实际分配的不一致,应先将每个已使用到的规格,包括 sys 租户的规格中 min_cpu/max_cpu min_memory/max_memory 分别调整为大小一致,这样使用 Q4 统计的结果也就准确了( Q3 的统计方法,本身就是使用的 max 值来计算的,所以不论是否调整min/max 为一致该方法都可以准确统计出当前 OBServer 节点可用(剩余)资源情况,不过这里仍然建议将不一致的调整为一致)。

调整资源规格示例 SQL:

alter resource unit ${unit_name} 
min_cpu=x,max_cpu=x,
min_memory='yG',max_memory='yG';

说明:

1、在 OceanBase 3.1.3 之前这里的内存数值大小不支持小数写法,在 OceanBase 3.1.3 开始支持小数写法。

2、这里的示例 SQL 应根据实际情况选择需要调整 cpu 或 memory 或两者都包含。

3、unit_name 需要替换为我们实际使用的。

---- OB4.0 资源使用情况查看

select zone,svr_ip,svr_port,
cpu_capacity,cpu_assigned_max,cpu_capacity-cpu_assigned_max as cpu_free,
round(memory_limit/1024/1024/1024,2) as memory_total_gb,
round((memory_limit-mem_capacity)/1024/1024/1024,2) as system_memory_gb,
round(mem_assigned/1024/1024/1024,2) as mem_assigned_gb,
round((mem_capacity-mem_assigned)/1024/1024/1024,2) as memory_free_gb,
round(log_disk_capacity/1024/1024/1024,2) as log_disk_capacity_gb,
round(log_disk_assigned/1024/1024/1024,2) as log_disk_assigned_gb,
round((log_disk_capacity-log_disk_assigned)/1024/1024/1024,2) as log_disk_free_gb,
round((data_disk_capacity/1024/1024/1024),2) as data_disk_gb,
round((data_disk_in_use/1024/1024/1024),2) as data_disk_used_gb,
round((data_disk_capacity-data_disk_in_use)/1024/1024/1024,2) as data_disk_free_gb
from gv$ob_servers;
3 个赞

前排

已学习( •̀ ω •́ )✧

调小clog磁盘使用率


[admin@obp1 ~]$ obd cluster edit-config obcluster      #添加参数,将clog磁盘空间使用率默认值80%降低为30%
  global:

   clog_disk_utilization_threshold: 30

4.X 版本

select zone, concat(SVR_IP,':',SVR_PORT) observer, CPU_CAPACITY_MAX cpu_total,CPU_ASSIGNED_MAX cpu_assigned,CPU_CAPACITY_MAX-CPU_ASSIGNED_MAX cpu_free,CAST(MEM_CAPACITY/1024/1024/1024 as DECIMAL(15,2)) mem_total_gb,CAST(mem_assigned/1024/1024/1024 as DECIMAL(15,2)) mem_assign_gb,CAST((MEM_CAPACITY-MEM_ASSIGNED)/1024/1024/1024 as DECIMAL(15,2)) mem_free_gb 
from gv$ob_servers;

4.X 版本

 select t2.`name` unit_config_name, t3.unit_id,t1.name resource_pool_name, t3.zone,t4.tenant_name, t4.tenant_id , concat(t3.svr_ip,':',t3.`svr_port`) observer,t2.max_cpu, t2.min_cpu, CAST(t2.memory_size/1024/1024/1024 as DECIMAL(15,2)) mem_size_gb,  CAST(t2.log_disk_size/1024/1024/1024 as DECIMAL(15,2)) log_disk_size_gb,t2.max_iops, t2.min_iops
 from __all_resource_pool t1 join __all_unit_config t2 on (t1.unit_config_id=t2.unit_config_id)
     join __all_unit t3 on (t1.`resource_pool_id` = t3.`resource_pool_id`)
     left join __all_tenant t4 on (t1.tenant_id=t4.tenant_id)
 order by t1.`resource_pool_id`, t2.`unit_config_id`, t3.unit_id;