4624:machine resource is not enough to hold a new unit

【 使用环境 】 测试环境
【 OB or 其他组件 】OCP部署
【 使用版本 】4.3.2
【问题描述】创建产品任务失败,报错machine resource is not enough to hold a new unit
【复现路径】问题出现前后相关操作
1,变量和全局参数ob_create_table_strict_mode均为OFF
2,SELECT * FROM __all_server; 结果:
2024-08-14 18:14:26.465723 2024-08-14 18:15:47.306766 192.168.31.31 2882 1 META_ZONE_1 2881 1 active 0 2.2.77_116010032023022813-4f4fbb6de5d75b4db00ee05d44d56d8c1500c21c(Feb 28 2023 13:48:43) 0 1723630544320002 0 1 0
3,SELECT tenant_id,tenant_name,locality FROM __all_tenant;结果:
1 sys FULL{1}@META_ZONE_1

【附件及日志】
############{1}{2024-08-14T19:44:30+08:00}############
[2024-08-14T19:44:30.627+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [queued]>
[2024-08-14T19:44:30.637+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [queued]>
[2024-08-14T19:44:30.638+0800] INFO -

[2024-08-14T19:44:30.638+0800] INFO - Starting attempt 1 of 1
[2024-08-14T19:44:30.638+0800] INFO -

[2024-08-14T19:44:30.657+0800] INFO - Executing <Task(_PythonDecoratedOperator): create_tenant> on 2024-08-14 11:44:28.523156+00:00
[2024-08-14T19:44:30.660+0800] INFO - Started process 6806 to run task
[2024-08-14T19:44:30.663+0800] INFO - Running: [‘airflow’, ‘tasks’, ‘run’, ‘init_ocp’, ‘create_tenant’, ‘manual__2024-08-14T11:44:28.523156+00:00’, ‘–job-id’, ‘205’, ‘–raw’, ‘–subdir’, ‘DAGS_FOLDER/init_ocp.py’, ‘–cfg-path’, ‘/tmp/tmpd_l7nfz5’]
[2024-08-14T19:44:30.665+0800] INFO - Job 205: Subtask create_tenant
[2024-08-14T19:44:30.727+0800] INFO - Running <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [running]> on host master01
[2024-08-14T19:44:30.796+0800] INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=init_ocp
AIRFLOW_CTX_TASK_ID=create_tenant
AIRFLOW_CTX_EXECUTION_DATE=2024-08-14T11:44:28.523156+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2024-08-14T11:44:28.523156+00:00
[2024-08-14T19:44:30.797+0800] INFO - use metadb connection
[2024-08-14T19:44:30.797+0800] INFO - Running statement: select a.id, a.ip, a.hardware, b.name as idc, b.region from oat_server a, oat_idc b where a.idc_id=b.id and a.id in (%s), parameters: [1]
[2024-08-14T19:44:30.798+0800] INFO - Rows affected: 1
[2024-08-14T19:44:30.799+0800] INFO - Execute query: select distinct(zone) from __all_server order by zone, args: None
[2024-08-14T19:44:30.800+0800] INFO - Execute rows: 1
[2024-08-14T19:44:30.800+0800] INFO - Execute query: select info from __all_zone where zone=%s and name=‘region’, args: (‘META_ZONE_1’,)
[2024-08-14T19:44:30.801+0800] INFO - Execute rows: 1
[2024-08-14T19:44:30.801+0800] INFO - Execute query: select tenant_id from __all_resource_pool where name=%s, args: (‘jbfmeta_resource_pool’,)
[2024-08-14T19:44:30.801+0800] INFO - Execute rows: 0
[2024-08-14T19:44:30.801+0800] INFO - Execute query: select unit_config_id from __all_unit_config where name=%s, args: (‘jbfmeta_unit’,)
[2024-08-14T19:44:30.802+0800] INFO - Execute rows: 0
[2024-08-14T19:44:30.802+0800] INFO - Execute query: CREATE RESOURCE UNIT IF NOT EXISTS jbfmeta_unit MAX_CPU 2, MAX_MEMORY ‘3G’, MAX_IOPS 128 ,MAX_DISK_SIZE ‘1G’, MAX_SESSION_NUM 10000, args: None
[2024-08-14T19:44:31.052+0800] INFO - Execute rows: 0
[2024-08-14T19:44:31.052+0800] INFO - Execute query: CREATE RESOURCE POOL IF NOT EXISTS jbfmeta_resource_pool UNIT=‘jbfmeta_unit’, UNIT_NUM=1, ZONE_LIST=(‘META_ZONE_1’), args: None
[2024-08-14T19:44:31.178+0800] ERROR - Task failed with exception
Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/airflow/decorators/base.py”, line 217, in execute
return_value = super().execute(context)
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 175, in execute
return_value = self.execute_callable()
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 192, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File “/oat/task_engine/dags/init_ocp.py”, line 54, in create_tenant
common.create_tenant(ctx, logger, product=‘ocp’)
File “/oat/task_engine/plugins/common.py”, line 731, in create_tenant
cur.execute(sql)
File “/oat/task_engine/plugins/utils.py”, line 1612, in execute
res = super().execute(query, args)
File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 148, in execute
result = self._query(query)
File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 310, in _query
conn.query(q)
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 548, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 775, in _read_query_result
result.read()
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 1156, in read
first_packet = self.connection._read_packet()
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 725, in _read_packet
packet.raise_for_error()
File “/usr/local/lib/python3.9/site-packages/pymysql/protocol.py”, line 221, in raise_for_error
err.raise_mysql_exception(self._data)
File “/usr/local/lib/python3.9/site-packages/pymysql/err.py”, line 143, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.OperationalError: (4624, ‘machine resource is not enough to hold a new unit’)
[2024-08-14T19:44:31.185+0800] INFO - Marking task as FAILED. dag_id=init_ocp, task_id=create_tenant, execution_date=20240814T114428, start_date=20240814T114430, end_date=20240814T114431
[2024-08-14T19:44:31.186+0800] INFO - Running statement: update oat_audit set status=‘failed’, update_time=utc_timestamp(), failed_reason=%s where id=%s, parameters: [“failed task instance is init_ocp__create_tenant__20240814 and exception information is (4624, ‘machine resource is not enough to hold a new unit’)”, 58]
[2024-08-14T19:44:31.187+0800] INFO - Rows affected: 1
[2024-08-14T19:44:31.222+0800] ERROR - Failed to execute job 205 for task create_tenant ((4624, ‘machine resource is not enough to hold a new unit’); 6806)
[2024-08-14T19:44:31.235+0800] INFO - Task exited with return code 1
[2024-08-14T19:44:31.272+0800] INFO - 0 downstream tasks scheduled from follow-on schedule check

############{2}{2024-08-14T19:54:56+08:00}############
[2024-08-14T19:54:56.522+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [queued]>
[2024-08-14T19:54:56.533+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [queued]>
[2024-08-14T19:54:56.533+0800] INFO -

[2024-08-14T19:54:56.533+0800] INFO - Starting attempt 2 of 2
[2024-08-14T19:54:56.533+0800] INFO -

[2024-08-14T19:54:56.553+0800] INFO - Executing <Task(_PythonDecoratedOperator): create_tenant> on 2024-08-14 11:44:28.523156+00:00
[2024-08-14T19:54:56.556+0800] INFO - Started process 9807 to run task
[2024-08-14T19:54:56.559+0800] INFO - Running: [‘airflow’, ‘tasks’, ‘run’, ‘init_ocp’, ‘create_tenant’, ‘manual__2024-08-14T11:44:28.523156+00:00’, ‘–job-id’, ‘206’, ‘–raw’, ‘–subdir’, ‘DAGS_FOLDER/init_ocp.py’, ‘–cfg-path’, ‘/tmp/tmplbzo9dvo’]
[2024-08-14T19:54:56.561+0800] INFO - Job 206: Subtask create_tenant
[2024-08-14T19:54:56.626+0800] INFO - Running <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [running]> on host master01
[2024-08-14T19:54:56.692+0800] INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=init_ocp
AIRFLOW_CTX_TASK_ID=create_tenant
AIRFLOW_CTX_EXECUTION_DATE=2024-08-14T11:44:28.523156+00:00
AIRFLOW_CTX_TRY_NUMBER=2
AIRFLOW_CTX_DAG_RUN_ID=manual__2024-08-14T11:44:28.523156+00:00
[2024-08-14T19:54:56.693+0800] INFO - use metadb connection
[2024-08-14T19:54:56.694+0800] INFO - Running statement: select a.id, a.ip, a.hardware, b.name as idc, b.region from oat_server a, oat_idc b where a.idc_id=b.id and a.id in (%s), parameters: [1]
[2024-08-14T19:54:56.695+0800] INFO - Rows affected: 1
[2024-08-14T19:54:56.696+0800] INFO - Execute query: select distinct(zone) from __all_server order by zone, args: None
[2024-08-14T19:54:56.696+0800] INFO - Execute rows: 1
[2024-08-14T19:54:56.697+0800] INFO - Execute query: select info from __all_zone where zone=%s and name=‘region’, args: (‘META_ZONE_1’,)
[2024-08-14T19:54:56.697+0800] INFO - Execute rows: 1
[2024-08-14T19:54:56.697+0800] INFO - Execute query: select tenant_id from __all_resource_pool where name=%s, args: (‘jbfmeta_resource_pool’,)
[2024-08-14T19:54:56.698+0800] INFO - Execute rows: 0
[2024-08-14T19:54:56.698+0800] INFO - Execute query: select unit_config_id from __all_unit_config where name=%s, args: (‘jbfmeta_unit’,)
[2024-08-14T19:54:56.698+0800] INFO - Execute rows: 1
[2024-08-14T19:54:56.698+0800] INFO - Execute query: select name from __all_resource_pool where unit_config_id=%s, args: (1002,)
[2024-08-14T19:54:56.700+0800] INFO - Execute rows: 0
[2024-08-14T19:54:56.700+0800] INFO - Execute query: drop resource unit jbfmeta_unit, args: None
[2024-08-14T19:54:56.737+0800] INFO - Execute rows: 0
[2024-08-14T19:54:56.737+0800] INFO - Execute query: CREATE RESOURCE UNIT IF NOT EXISTS jbfmeta_unit MAX_CPU 2, MAX_MEMORY ‘3G’, MAX_IOPS 128 ,MAX_DISK_SIZE ‘1G’, MAX_SESSION_NUM 10000, args: None
[2024-08-14T19:54:56.871+0800] INFO - Execute rows: 0
[2024-08-14T19:54:56.871+0800] INFO - Execute query: CREATE RESOURCE POOL IF NOT EXISTS jbfmeta_resource_pool UNIT=‘jbfmeta_unit’, UNIT_NUM=1, ZONE_LIST=(‘META_ZONE_1’), args: None
[2024-08-14T19:54:56.938+0800] ERROR - Task failed with exception
Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/airflow/decorators/base.py”, line 217, in execute
return_value = super().execute(context)
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 175, in execute
return_value = self.execute_callable()
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 192, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File “/oat/task_engine/dags/init_ocp.py”, line 54, in create_tenant
common.create_tenant(ctx, logger, product=‘ocp’)
File “/oat/task_engine/plugins/common.py”, line 731, in create_tenant
cur.execute(sql)
File “/oat/task_engine/plugins/utils.py”, line 1612, in execute
res = super().execute(query, args)
File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 148, in execute
result = self._query(query)
File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 310, in _query
conn.query(q)
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 548, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 775, in _read_query_result
result.read()
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 1156, in read
first_packet = self.connection._read_packet()
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 725, in _read_packet
packet.raise_for_error()
File “/usr/local/lib/python3.9/site-packages/pymysql/protocol.py”, line 221, in raise_for_error
err.raise_mysql_exception(self._data)
File “/usr/local/lib/python3.9/site-packages/pymysql/err.py”, line 143, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.OperationalError: (4624, ‘machine resource is not enough to hold a new unit’)
[2024-08-14T19:54:56.946+0800] INFO - Marking task as FAILED. dag_id=init_ocp, task_id=create_tenant, execution_date=20240814T114428, start_date=20240814T115456, end_date=20240814T115456
[2024-08-14T19:54:56.947+0800] INFO - Running statement: update oat_audit set status=‘failed’, update_time=utc_timestamp(), failed_reason=%s where id=%s, parameters: [“failed task instance is init_ocp__create_tenant__20240814 and exception information is (4624, ‘machine resource is not enough to hold a new unit’)”, 58]
[2024-08-14T19:54:56.947+0800] INFO - Rows affected: 1
[2024-08-14T19:54:56.969+0800] ERROR - Failed to execute job 206 for task create_tenant ((4624, ‘machine resource is not enough to hold a new unit’); 9807)
[2024-08-14T19:54:57.011+0800] INFO - Task exited with return code 1
[2024-08-14T19:54:57.046+0800] INFO - 0 downstream tasks scheduled from follow-on schedule check

############{3}{2024-08-14T20:10:32+08:00}############
[2024-08-14T20:10:32.808+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [queued]>
[2024-08-14T20:10:32.820+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [queued]>
[2024-08-14T20:10:32.820+0800] INFO -

[2024-08-14T20:10:32.820+0800] INFO - Starting attempt 3 of 3
[2024-08-14T20:10:32.821+0800] INFO -

[2024-08-14T20:10:32.841+0800] INFO - Executing <Task(_PythonDecoratedOperator): create_tenant> on 2024-08-14 11:44:28.523156+00:00
[2024-08-14T20:10:32.844+0800] INFO - Started process 14497 to run task
[2024-08-14T20:10:32.847+0800] INFO - Running: [‘airflow’, ‘tasks’, ‘run’, ‘init_ocp’, ‘create_tenant’, ‘manual__2024-08-14T11:44:28.523156+00:00’, ‘–job-id’, ‘209’, ‘–raw’, ‘–subdir’, ‘DAGS_FOLDER/init_ocp.py’, ‘–cfg-path’, ‘/tmp/tmpofcrjsjc’]
[2024-08-14T20:10:32.849+0800] INFO - Job 209: Subtask create_tenant
[2024-08-14T20:10:32.912+0800] INFO - Running <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [running]> on host master01
[2024-08-14T20:10:32.978+0800] INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=init_ocp
AIRFLOW_CTX_TASK_ID=create_tenant
AIRFLOW_CTX_EXECUTION_DATE=2024-08-14T11:44:28.523156+00:00
AIRFLOW_CTX_TRY_NUMBER=3
AIRFLOW_CTX_DAG_RUN_ID=manual__2024-08-14T11:44:28.523156+00:00
[2024-08-14T20:10:32.979+0800] INFO - use metadb connection
[2024-08-14T20:10:32.980+0800] INFO - Running statement: select a.id, a.ip, a.hardware, b.name as idc, b.region from oat_server a, oat_idc b where a.idc_id=b.id and a.id in (%s), parameters: [1]
[2024-08-14T20:10:32.980+0800] INFO - Rows affected: 1
[2024-08-14T20:10:32.982+0800] INFO - Execute query: select distinct(zone) from __all_server order by zone, args: None
[2024-08-14T20:10:32.982+0800] INFO - Execute rows: 1
[2024-08-14T20:10:32.982+0800] INFO - Execute query: select info from __all_zone where zone=%s and name=‘region’, args: (‘META_ZONE_1’,)
[2024-08-14T20:10:32.983+0800] INFO - Execute rows: 1
[2024-08-14T20:10:32.983+0800] INFO - Execute query: select tenant_id from __all_resource_pool where name=%s, args: (‘jbfmeta_resource_pool’,)
[2024-08-14T20:10:32.983+0800] INFO - Execute rows: 0
[2024-08-14T20:10:32.984+0800] INFO - Execute query: select unit_config_id from __all_unit_config where name=%s, args: (‘jbfmeta_unit’,)
[2024-08-14T20:10:32.984+0800] INFO - Execute rows: 1
[2024-08-14T20:10:32.984+0800] INFO - Execute query: select name from __all_resource_pool where unit_config_id=%s, args: (1003,)
[2024-08-14T20:10:32.985+0800] INFO - Execute rows: 0
[2024-08-14T20:10:32.985+0800] INFO - Execute query: drop resource unit jbfmeta_unit, args: None
[2024-08-14T20:10:33.030+0800] INFO - Execute rows: 0
[2024-08-14T20:10:33.031+0800] INFO - Execute query: CREATE RESOURCE UNIT IF NOT EXISTS jbfmeta_unit MAX_CPU 2, MAX_MEMORY ‘3G’, MAX_IOPS 128 ,MAX_DISK_SIZE ‘1G’, MAX_SESSION_NUM 10000, args: None
[2024-08-14T20:10:33.264+0800] INFO - Execute rows: 0
[2024-08-14T20:10:33.265+0800] INFO - Execute query: CREATE RESOURCE POOL IF NOT EXISTS jbfmeta_resource_pool UNIT=‘jbfmeta_unit’, UNIT_NUM=1, ZONE_LIST=(‘META_ZONE_1’), args: None
[2024-08-14T20:10:33.332+0800] ERROR - Task failed with exception
Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/airflow/decorators/base.py”, line 217, in execute
return_value = super().execute(context)
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 175, in execute
return_value = self.execute_callable()
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 192, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File “/oat/task_engine/dags/init_ocp.py”, line 54, in create_tenant
common.create_tenant(ctx, logger, product=‘ocp’)
File “/oat/task_engine/plugins/common.py”, line 731, in create_tenant
cur.execute(sql)
File “/oat/task_engine/plugins/utils.py”, line 1612, in execute
res = super().execute(query, args)
File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 148, in execute
result = self._query(query)
File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 310, in _query
conn.query(q)
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 548, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 775, in _read_query_result
result.read()
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 1156, in read
first_packet = self.connection._read_packet()
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 725, in _read_packet
packet.raise_for_error()
File “/usr/local/lib/python3.9/site-packages/pymysql/protocol.py”, line 221, in raise_for_error
err.raise_mysql_exception(self._data)
File “/usr/local/lib/python3.9/site-packages/pymysql/err.py”, line 143, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.OperationalError: (4624, ‘machine resource is not enough to hold a new unit’)
[2024-08-14T20:10:33.340+0800] INFO - Marking task as FAILED. dag_id=init_ocp, task_id=create_tenant, execution_date=20240814T114428, start_date=20240814T121032, end_date=20240814T121033
[2024-08-14T20:10:33.340+0800] INFO - Running statement: update oat_audit set status=‘failed’, update_time=utc_timestamp(), failed_reason=%s where id=%s, parameters: [“failed task instance is init_ocp__create_tenant__20240814 and exception information is (4624, ‘machine resource is not enough to hold a new unit’)”, 58]
[2024-08-14T20:10:33.341+0800] INFO - Rows affected: 1
[2024-08-14T20:10:33.360+0800] ERROR - Failed to execute job 209 for task create_tenant ((4624, ‘machine resource is not enough to hold a new unit’); 14497)
[2024-08-14T20:10:33.379+0800] INFO - Task exited with return code 1
[2024-08-14T20:10:33.416+0800] INFO - 0 downstream tasks scheduled from follow-on schedule check

############{4}{2024-08-14T20:11:38+08:00}############
[2024-08-14T20:11:38.642+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [queued]>
[2024-08-14T20:11:38.652+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [queued]>
[2024-08-14T20:11:38.652+0800] INFO -

[2024-08-14T20:11:38.653+0800] INFO - Starting attempt 4 of 4
[2024-08-14T20:11:38.653+0800] INFO -

[2024-08-14T20:11:38.672+0800] INFO - Executing <Task(_PythonDecoratedOperator): create_tenant> on 2024-08-14 11:44:28.523156+00:00
[2024-08-14T20:11:38.675+0800] INFO - Started process 14796 to run task
[2024-08-14T20:11:38.678+0800] INFO - Running: [‘airflow’, ‘tasks’, ‘run’, ‘init_ocp’, ‘create_tenant’, ‘manual__2024-08-14T11:44:28.523156+00:00’, ‘–job-id’, ‘210’, ‘–raw’, ‘–subdir’, ‘DAGS_FOLDER/init_ocp.py’, ‘–cfg-path’, ‘/tmp/tmp6hf5d2rp’]
[2024-08-14T20:11:38.680+0800] INFO - Job 210: Subtask create_tenant
[2024-08-14T20:11:38.745+0800] INFO - Running <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [running]> on host master01
[2024-08-14T20:11:38.811+0800] INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=init_ocp
AIRFLOW_CTX_TASK_ID=create_tenant
AIRFLOW_CTX_EXECUTION_DATE=2024-08-14T11:44:28.523156+00:00
AIRFLOW_CTX_TRY_NUMBER=4
AIRFLOW_CTX_DAG_RUN_ID=manual__2024-08-14T11:44:28.523156+00:00
[2024-08-14T20:11:38.812+0800] INFO - use metadb connection
[2024-08-14T20:11:38.813+0800] INFO - Running statement: select a.id, a.ip, a.hardware, b.name as idc, b.region from oat_server a, oat_idc b where a.idc_id=b.id and a.id in (%s), parameters: [1]
[2024-08-14T20:11:38.813+0800] INFO - Rows affected: 1
[2024-08-14T20:11:38.815+0800] INFO - Execute query: select distinct(zone) from __all_server order by zone, args: None
[2024-08-14T20:11:38.815+0800] INFO - Execute rows: 1
[2024-08-14T20:11:38.816+0800] INFO - Execute query: select info from __all_zone where zone=%s and name=‘region’, args: (‘META_ZONE_1’,)
[2024-08-14T20:11:38.816+0800] INFO - Execute rows: 1
[2024-08-14T20:11:38.816+0800] INFO - Execute query: select tenant_id from __all_resource_pool where name=%s, args: (‘jbfmeta_resource_pool’,)
[2024-08-14T20:11:38.817+0800] INFO - Execute rows: 0
[2024-08-14T20:11:38.817+0800] INFO - Execute query: select unit_config_id from __all_unit_config where name=%s, args: (‘jbfmeta_unit’,)
[2024-08-14T20:11:38.817+0800] INFO - Execute rows: 1
[2024-08-14T20:11:38.817+0800] INFO - Execute query: select name from __all_resource_pool where unit_config_id=%s, args: (1004,)
[2024-08-14T20:11:38.818+0800] INFO - Execute rows: 0
[2024-08-14T20:11:38.818+0800] INFO - Execute query: drop resource unit jbfmeta_unit, args: None
[2024-08-14T20:11:38.884+0800] INFO - Execute rows: 0
[2024-08-14T20:11:38.884+0800] INFO - Execute query: CREATE RESOURCE UNIT IF NOT EXISTS jbfmeta_unit MAX_CPU 2, MAX_MEMORY ‘3G’, MAX_IOPS 128 ,MAX_DISK_SIZE ‘1G’, MAX_SESSION_NUM 10000, args: None
[2024-08-14T20:11:39.017+0800] INFO - Execute rows: 0
[2024-08-14T20:11:39.018+0800] INFO - Execute query: CREATE RESOURCE POOL IF NOT EXISTS jbfmeta_resource_pool UNIT=‘jbfmeta_unit’, UNIT_NUM=1, ZONE_LIST=(‘META_ZONE_1’), args: None
[2024-08-14T20:11:39.087+0800] ERROR - Task failed with exception
Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/airflow/decorators/base.py”, line 217, in execute
return_value = super().execute(context)
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 175, in execute
return_value = self.execute_callable()
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 192, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File “/oat/task_engine/dags/init_ocp.py”, line 54, in create_tenant
common.create_tenant(ctx, logger, product=‘ocp’)
File “/oat/task_engine/plugins/common.py”, line 731, in create_tenant
cur.execute(sql)
File “/oat/task_engine/plugins/utils.py”, line 1612, in execute
res = super().execute(query, args)
File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 148, in execute
result = self._query(query)
File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 310, in _query
conn.query(q)
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 548, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 775, in _read_query_result
result.read()
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 1156, in read
first_packet = self.connection._read_packet()
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 725, in _read_packet
packet.raise_for_error()
File “/usr/local/lib/python3.9/site-packages/pymysql/protocol.py”, line 221, in raise_for_error
err.raise_mysql_exception(self._data)
File “/usr/local/lib/python3.9/site-packages/pymysql/err.py”, line 143, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.OperationalError: (4624, ‘machine resource is not enough to hold a new unit’)
[2024-08-14T20:11:39.094+0800] INFO - Marking task as FAILED. dag_id=init_ocp, task_id=create_tenant, execution_date=20240814T114428, start_date=20240814T121138, end_date=20240814T121139
[2024-08-14T20:11:39.095+0800] INFO - Running statement: update oat_audit set status=‘failed’, update_time=utc_timestamp(), failed_reason=%s where id=%s, parameters: [“failed task instance is init_ocp__create_tenant__20240814 and exception information is (4624, ‘machine resource is not enough to hold a new unit’)”, 58]
[2024-08-14T20:11:39.096+0800] INFO - Rows affected: 1
[2024-08-14T20:11:39.122+0800] ERROR - Failed to execute job 210 for task create_tenant ((4624, ‘machine resource is not enough to hold a new unit’); 14796)
[2024-08-14T20:11:39.129+0800] INFO - Task exited with return code 1
[2024-08-14T20:11:39.165+0800] INFO - 0 downstream tasks scheduled from follow-on schedule check

############{5}{2024-08-14T20:11:59+08:00}############
[2024-08-14T20:11:59.800+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [queued]>
[2024-08-14T20:11:59.810+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [queued]>
[2024-08-14T20:11:59.811+0800] INFO -

[2024-08-14T20:11:59.811+0800] INFO - Starting attempt 5 of 5
[2024-08-14T20:11:59.811+0800] INFO -

[2024-08-14T20:11:59.830+0800] INFO - Executing <Task(_PythonDecoratedOperator): create_tenant> on 2024-08-14 11:44:28.523156+00:00
[2024-08-14T20:11:59.833+0800] INFO - Started process 14803 to run task
[2024-08-14T20:11:59.836+0800] INFO - Running: [‘airflow’, ‘tasks’, ‘run’, ‘init_ocp’, ‘create_tenant’, ‘manual__2024-08-14T11:44:28.523156+00:00’, ‘–job-id’, ‘211’, ‘–raw’, ‘–subdir’, ‘DAGS_FOLDER/init_ocp.py’, ‘–cfg-path’, ‘/tmp/tmpz4ujrbzx’]
[2024-08-14T20:11:59.838+0800] INFO - Job 211: Subtask create_tenant
[2024-08-14T20:11:59.904+0800] INFO - Running <TaskInstance: init_ocp.create_tenant manual__2024-08-14T11:44:28.523156+00:00 [running]> on host master01
[2024-08-14T20:11:59.971+0800] INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=init_ocp
AIRFLOW_CTX_TASK_ID=create_tenant
AIRFLOW_CTX_EXECUTION_DATE=2024-08-14T11:44:28.523156+00:00
AIRFLOW_CTX_TRY_NUMBER=5
AIRFLOW_CTX_DAG_RUN_ID=manual__2024-08-14T11:44:28.523156+00:00
[2024-08-14T20:11:59.972+0800] INFO - use metadb connection
[2024-08-14T20:11:59.973+0800] INFO - Running statement: select a.id, a.ip, a.hardware, b.name as idc, b.region from oat_server a, oat_idc b where a.idc_id=b.id and a.id in (%s), parameters: [1]
[2024-08-14T20:11:59.973+0800] INFO - Rows affected: 1
[2024-08-14T20:11:59.975+0800] INFO - Execute query: select distinct(zone) from __all_server order by zone, args: None
[2024-08-14T20:11:59.975+0800] INFO - Execute rows: 1
[2024-08-14T20:11:59.976+0800] INFO - Execute query: select info from __all_zone where zone=%s and name=‘region’, args: (‘META_ZONE_1’,)
[2024-08-14T20:11:59.976+0800] INFO - Execute rows: 1
[2024-08-14T20:11:59.976+0800] INFO - Execute query: select tenant_id from __all_resource_pool where name=%s, args: (‘jbfmeta_resource_pool’,)
[2024-08-14T20:11:59.977+0800] INFO - Execute rows: 0
[2024-08-14T20:11:59.977+0800] INFO - Execute query: select unit_config_id from __all_unit_config where name=%s, args: (‘jbfmeta_unit’,)
[2024-08-14T20:11:59.977+0800] INFO - Execute rows: 1
[2024-08-14T20:11:59.978+0800] INFO - Execute query: select name from __all_resource_pool where unit_config_id=%s, args: (1005,)
[2024-08-14T20:11:59.978+0800] INFO - Execute rows: 0
[2024-08-14T20:11:59.978+0800] INFO - Execute query: drop resource unit jbfmeta_unit, args: None
[2024-08-14T20:12:00.070+0800] INFO - Execute rows: 0
[2024-08-14T20:12:00.070+0800] INFO - Execute query: CREATE RESOURCE UNIT IF NOT EXISTS jbfmeta_unit MAX_CPU 2, MAX_MEMORY ‘3G’, MAX_IOPS 128 ,MAX_DISK_SIZE ‘1G’, MAX_SESSION_NUM 10000, args: None
[2024-08-14T20:12:00.220+0800] INFO - Execute rows: 0
[2024-08-14T20:12:00.220+0800] INFO - Execute query: CREATE RESOURCE POOL IF NOT EXISTS jbfmeta_resource_pool UNIT=‘jbfmeta_unit’, UNIT_NUM=1, ZONE_LIST=(‘META_ZONE_1’), args: None
[2024-08-14T20:12:00.287+0800] ERROR - Task failed with exception
Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/airflow/decorators/base.py”, line 217, in execute
return_value = super().execute(context)
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 175, in execute
return_value = self.execute_callable()
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 192, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File “/oat/task_engine/dags/init_ocp.py”, line 54, in create_tenant
common.create_tenant(ctx, logger, product=‘ocp’)
File “/oat/task_engine/plugins/common.py”, line 731, in create_tenant
cur.execute(sql)
File “/oat/task_engine/plugins/utils.py”, line 1612, in execute
res = super().execute(query, args)
File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 148, in execute
result = self._query(query)
File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 310, in _query
conn.query(q)
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 548, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 775, in _read_query_result
result.read()
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 1156, in read
first_packet = self.connection._read_packet()
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 725, in _read_packet
packet.raise_for_error()
File “/usr/local/lib/python3.9/site-packages/pymysql/protocol.py”, line 221, in raise_for_error
err.raise_mysql_exception(self._data)
File “/usr/local/lib/python3.9/site-packages/pymysql/err.py”, line 143, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.OperationalError: (4624, ‘machine resource is not enough to hold a new unit’)
[2024-08-14T20:12:00.295+0800] INFO - Marking task as FAILED. dag_id=init_ocp, task_id=create_tenant, execution_date=20240814T114428, start_date=20240814T121159, end_date=20240814T121200
[2024-08-14T20:12:00.296+0800] INFO - Running statement: update oat_audit set status=‘failed’, update_time=utc_timestamp(), failed_reason=%s where id=%s, parameters: [“failed task instance is init_ocp__create_tenant__20240814 and exception information is (4624, ‘machine resource is not enough to hold a new unit’)”, 58]
[2024-08-14T20:12:00.296+0800] INFO - Rows affected: 1
[2024-08-14T20:12:00.325+0800] ERROR - Failed to execute job 211 for task create_tenant ((4624, ‘machine resource is not enough to hold a new unit’); 14803)
[2024-08-14T20:12:00.368+0800] INFO - Task exited with return code 1
[2024-08-14T20:12:00.405+0800] INFO - 0 downstream tasks scheduled from follow-on schedule check

【备注】基于 LLM 和开源文档 RAG 的论坛小助手已开放测试,在发帖时输入 [@论坛小助手] 即可召唤小助手,欢迎试用!

1 个赞

根据您提供的信息,您在使用 OCP 部署 OceanBase 4.3.2 版本时遇到了创建产品任务失败的问题,并且报错了错误码 4624,错误信息为 “machine resource is not enough to hold a new unit”。这通常意味着当前机器上的资源不足以支持新增加的 Unit。

根因分析场景:

这种情况下,obdiag 目前没有直接对应的根因分析场景来定位此类资源不足的问题。

日志采集场景:

为了更好地理解问题的具体背景和环境配置,建议您使用 obdiag 工具来采集集群的基础信息,这将有助于我们进一步分析问题所在。

日志采集命令:

obdiag gather scene run --scene=observer.base

操作指南:

  1. 执行采集命令:请在出现问题的 OceanBase 集群上执行上述命令。
  2. 提供日志文件:命令执行完成后,请将生成的日志文件或压缩包发送给我们,以便于我们进行更深入的分析。

需要更多信息:

为了更准确地定位问题,能否请您提供以下信息:

  1. 当前集群的资源分配情况是怎样的?
  2. 您尝试创建的 Unit 的规格是多少?
  3. 是否有正在进行的资源调整操作,例如资源池的创建或变更、Unit 的迁移等?
  4. 当前集群的负载情况如何?是否有大量的读写操作正在进行?

以上信息将有助于我们更快地定位问题所在。

具体信息可参考以下文档:

附上敏捷诊断工具 obdiag 使用帮助链接

1 个赞

报错信息已经非常明白了呀, 当前机器剩余的资源不足以再创建一个resource pool。 你可以尝试指定一个小一点的resource unit再创建resource pool。

1 个赞

已经是最小设置了,这个资源不知道是指的什么资源,看错误文档说明有点像是实例数不够选举,内存和cpu也是够的,硬盘用的1.8T

1 个赞

再向您确认几个问题:

  1. 您提到已经将设置调整到了最小,能否具体说明一下您所指的最小设置具体是多少例如 CPU 内存和磁盘空间
  2. 您提到的实例数不够选举是否是指集群中的服务器数量是否可以提供具体的服务器数量和集群配置
  3. 您提到内存和 CPU 是够的请问您是如何评估这些资源是否充足的可以提供一些具体的指标或者监控数据吗
  4. 关于硬盘您提到使用的是 1.8T 请问当前使用的磁盘空间占用情况如何还有多少可用空间
1 个赞

执行如下语句:
select

zone,

svr_ip,

(cpu_capacity - cpu_assigned) as cpu_free,

(mem_capacity - mem_assigned) / 1024 / 1024 / 1024 as mem_free_gb,

(log_disk_capacity - log_disk_in_use) / 1024 / 1024 / 1024 as log_disk_free_gb,

(data_disk_capacity - data_disk_in_use) / 1024 / 1024 / 1024 as data_disk_free_gb

from

gv$ob_servers

order by

zone,

svr_ip;

1 个赞

我给你一个公式 你看看 资源是否充裕


memory_limit_percentage 系统总可用内存大小占总内存大小的百分比 ,默认值是80% ,

memory_limit 可用总内存大小。优先级大于memory_limit_percentage。

system_memory 系统内置500租户内存,一般默认30~50G
所以剩余的可用于创建业务租户的内存
[(物理内存* memory_limit_percentage) | memory_limit ] - 500租户(system_memory) - sys租户内存= 业务租户内存

1 个赞

资源单元规格是资源CPU=2,内存=3G,DISK_SIZE =1G

CREATE RESOURCE UNIT IF NOT EXISTS jbfmeta_unit MAX_CPU 2, MAX_MEMORY ‘3G’, MAX_IOPS 128 ,MAX_DISK_SIZE ‘1G’, MAX_SESSION_NUM 10000, args: None

实例化创建资源的时候在META_ZONE_1这个zone上,然后执行执行失败,报错machine resource is not enough to hold a new unit(下面的日志信息),所以要查一下这个META_ZONE_1这个zone上的机器CPU=2,内存=3G,DISK_SIZE =1G是否资源足够。
使用下面语句可以查看节点的资源分配信息

 SELECT * FROM GV$OB_SERVERS;

如果机器物理资源充足,要查一下是不是配置了CPU、内存以及磁盘的一些上限参数导致。
查询方法参考:

show parameters where name in('cpu_count','memory_limit','memory_limit_percentage','system_memory','log_disk_utilization_limit_threshold','datafile_disk_percentage');

1 个赞

查看一下 资源信息是否充足

查看集群 CPU、内存、磁盘参数配置信息

show parameters where name in (‘memory_limit’,‘memory_limit_percentage’,‘system_memory’,‘log_disk_size’,‘log_disk_percentage’,‘datafile_size’,‘datafile_disk_percentage’);

集群 server 级资源分配情况

select zone,concat(SVR_IP,’:’,SVR_PORT) observer,
cpu_capacity_max cpu_total,cpu_assigned_max cpu_assigned,
cpu_capacity-cpu_assigned_max as cpu_free,
round(memory_limit/1024/1024/1024,2) as memory_total,
round((memory_limit-mem_capacity)/1024/1024/1024,2) as system_memory,
round(mem_assigned/1024/1024/1024,2) as mem_assigned,
round((mem_capacity-mem_assigned)/1024/1024/1024,2) as memory_free,
round(log_disk_capacity/1024/1024/1024,2) as log_disk_capacity,
round(log_disk_assigned/1024/1024/1024,2) as log_disk_assigned,
round((log_disk_capacity-log_disk_assigned)/1024/1024/1024,2) as log_disk_free,
round((data_disk_capacity/1024/1024/1024),2) as data_disk,
round((data_disk_in_use/1024/1024/1024),2) as data_disk_used,
round((data_disk_capacity-data_disk_in_use)/1024/1024/1024,2) as data_disk_free
from oceanbase.gv$ob_servers;

这个问题有进展吗