【 使用环境 】生产环境
【 OB or 其他组件 】 OCP
【 使用版本 】3.3.4
【问题描述】
【复现路径】create_tenant报错
【附件及日志】 [2023-12-12T10:12:21.700+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2023-12-12T02:12:19.577416+00:00 [queued]>
2
[2023-12-12T10:12:21.711+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2023-12-12T02:12:19.577416+00:00 [queued]>
3
[2023-12-12T10:12:21.711+0800] INFO -
4
5
[2023-12-12T10:12:21.711+0800] INFO - Starting attempt 1 of 1
6
[2023-12-12T10:12:21.711+0800] INFO -
7
8
[2023-12-12T10:12:21.725+0800] INFO - Executing <Task(_PythonDecoratedOperator): create_tenant> on 2023-12-12 02:12:19.577416+00:00
9
[2023-12-12T10:12:21.729+0800] INFO - Started process 18229 to run task
10
[2023-12-12T10:12:21.736+0800] INFO - Running: [‘airflow’, ‘tasks’, ‘run’, ‘init_ocp’, ‘create_tenant’, ‘manual__2023-12-12T02:12:19.577416+00:00’, ‘–job-id’, ‘44’, ‘–raw’, ‘–subdir’, ‘DAGS_FOLDER/init_ocp.py’, ‘–cfg-path’, ‘/tmp/tmp28_zz5z_’]
11
[2023-12-12T10:12:21.740+0800] INFO - Job 44: Subtask create_tenant
12
[2023-12-12T10:12:21.812+0800] INFO - Running <TaskInstance: init_ocp.create_tenant manual__2023-12-12T02:12:19.577416+00:00 [running]> on host localhost.localdomain
13
[2023-12-12T10:12:21.878+0800] INFO - Exporting the following env vars:
14
AIRFLOW_CTX_DAG_OWNER=airflow
15
AIRFLOW_CTX_DAG_ID=init_ocp
16
AIRFLOW_CTX_TASK_ID=create_tenant
17
AIRFLOW_CTX_EXECUTION_DATE=2023-12-12T02:12:19.577416+00:00
18
AIRFLOW_CTX_TRY_NUMBER=1
19
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-12-12T02:12:19.577416+00:00
20
[2023-12-12T10:12:21.879+0800] INFO - use metadb connection
21
[2023-12-12T10:12:21.880+0800] INFO - Running statement: select a.id, a.ip, a.hardware, b.name as idc, b.region from oat_server a, oat_idc b where a.idc_id=b.id and a.id in (%s), parameters: [1]
22
[2023-12-12T10:12:21.881+0800] INFO - Rows affected: 1
23
[2023-12-12T10:12:21.883+0800] INFO - Execute query: select distinct(zone) from __all_server order by zone, args: None
24
[2023-12-12T10:12:21.884+0800] INFO - Execute rows: 1
25
[2023-12-12T10:12:21.884+0800] INFO - Execute query: select info from __all_zone where zone=%s and name=‘region’, args: (‘META_ZONE_1’,)
26
[2023-12-12T10:12:21.885+0800] INFO - Execute rows: 1
27
[2023-12-12T10:12:21.885+0800] INFO - Execute query: select tenant_id from __all_resource_pool where name=%s, args: (‘xw_ocp_resource_pool’,)
28
[2023-12-12T10:12:21.886+0800] INFO - Execute rows: 0
29
[2023-12-12T10:12:21.886+0800] INFO - Execute query: select unit_config_id from __all_unit_config where name=%s, args: (‘xw_ocp_unit’,)
30
[2023-12-12T10:12:21.887+0800] INFO - Execute rows: 1
31
[2023-12-12T10:12:21.887+0800] INFO - Execute query: select name from __all_resource_pool where unit_config_id=%s, args: (1003,)
32
[2023-12-12T10:12:21.887+0800] INFO - Execute rows: 0
33
[2023-12-12T10:12:21.888+0800] INFO - Execute query: drop resource unit xw_ocp_unit, args: None
34
[2023-12-12T10:12:21.890+0800] INFO - Execute rows: 0
35
[2023-12-12T10:12:21.890+0800] INFO - Execute query: CREATE RESOURCE UNIT IF NOT EXISTS xw_ocp_unit MAX_CPU 2, MAX_MEMORY ‘3G’, MAX_IOPS 128 ,MAX_DISK_SIZE ‘1G’, MAX_SESSION_NUM 10000, args: None
36
[2023-12-12T10:12:21.894+0800] INFO - Execute rows: 0
37
[2023-12-12T10:12:21.894+0800] INFO - Execute query: CREATE RESOURCE POOL IF NOT EXISTS xw_ocp_resource_pool UNIT=‘xw_ocp_unit’, UNIT_NUM=1, ZONE_LIST=(‘META_ZONE_1’), args: None
38
[2023-12-12T10:12:21.898+0800] ERROR - Task failed with exception
39
Traceback (most recent call last):
40
File “/usr/local/lib/python3.9/site-packages/airflow/decorators/base.py”, line 217, in execute
41
return_value = super().execute(context)
42
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 175, in execute
43
return_value = self.execute_callable()
44
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 192, in execute_callable
45
return self.python_callable(*self.op_args, **self.op_kwargs)
46
File “/oat/task_engine/dags/init_ocp.py”, line 54, in create_tenant
47
common.create_tenant(ctx, logger, product=‘ocp’)
48
File “/oat/task_engine/plugins/common.py”, line 731, in create_tenant
49
cur.execute(sql)
50
File “/oat/task_engine/plugins/utils.py”, line 1612, in execute
51
res = super().execute(query, args)
52
File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 148, in execute
53
result = self._query(query)
54
File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 310, in _query
55
conn.query(q)
56
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 548, in query
57
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
58
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 775, in _read_query_result
59
result.read()
60
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 1156, in read
61
first_packet = self.connection._read_packet()
62
File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 725, in _read_packet
63
packet.raise_for_error()
64
File “/usr/local/lib/python3.9/site-packages/pymysql/protocol.py”, line 221, in raise_for_error
65
err.raise_mysql_exception(self._data)
66
File “/usr/local/lib/python3.9/site-packages/pymysql/err.py”, line 143, in raise_mysql_exception
67
raise errorclass(errno, errval)
68
pymysql.err.OperationalError: (4624, ‘machine resource is not enough to hold a new unit’)
69
[2023-12-12T10:12:21.908+0800] INFO - Marking task as FAILED. dag_id=init_ocp, task_id=create_tenant, execution_date=20231212T021219, start_date=20231212T021221, end_date=20231212T021221
70
[2023-12-12T10:12:21.909+0800] INFO - Running statement: update oat_audit set status=‘failed’, update_time=utc_timestamp(), failed_reason=%s where id=%s, parameters: [“failed task instance is init_ocp__create_tenant__20231212 and exception information is (4624, ‘machine resource is not enough to hold a new unit’)”, 14]
71
[2023-12-12T10:12:21.909+0800] INFO - Rows affected: 1
72
[2023-12-12T10:12:21.922+0800] ERROR - Failed to execute job 44 for task create_tenant ((4624, ‘machine resource is not enough to hold a new unit’); 18229)
73
[2023-12-12T10:12:21.945+0800] INFO - Task exited with return code 1
74
[2023-12-12T10:12:21.994+0800] INFO - 0 downstream tasks scheduled from follow-on schedule chec