ocp搭建报错-

【使用环境 】测试环境
【 OB or 其他组件 】 OCP
【 使用版本 】3.3.4
【问题描述】
【复现路径】create_tenant报错

【附件及日志】 [2023-12-12T10:12:21.700+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2023-12-12T02:12:19.577416+00:00 [queued]>

2

[2023-12-12T10:12:21.711+0800] INFO - Dependencies all met for <TaskInstance: init_ocp.create_tenant manual__2023-12-12T02:12:19.577416+00:00 [queued]>

3

[2023-12-12T10:12:21.711+0800] INFO -

4

5

[2023-12-12T10:12:21.711+0800] INFO - Starting attempt 1 of 1

6

[2023-12-12T10:12:21.711+0800] INFO -

7

8

[2023-12-12T10:12:21.725+0800] INFO - Executing <Task(_PythonDecoratedOperator): create_tenant> on 2023-12-12 02:12:19.577416+00:00

9

[2023-12-12T10:12:21.729+0800] INFO - Started process 18229 to run task

10

[2023-12-12T10:12:21.736+0800] INFO - Running: [‘airflow’, ‘tasks’, ‘run’, ‘init_ocp’, ‘create_tenant’, ‘manual__2023-12-12T02:12:19.577416+00:00’, ‘–job-id’, ‘44’, ‘–raw’, ‘–subdir’, ‘DAGS_FOLDER/init_ocp.py’, ‘–cfg-path’, ‘/tmp/tmp28_zz5z_’]

11

[2023-12-12T10:12:21.740+0800] INFO - Job 44: Subtask create_tenant

12

[2023-12-12T10:12:21.812+0800] INFO - Running <TaskInstance: init_ocp.create_tenant manual__2023-12-12T02:12:19.577416+00:00 [running]> on host localhost.localdomain

13

[2023-12-12T10:12:21.878+0800] INFO - Exporting the following env vars:

14

AIRFLOW_CTX_DAG_OWNER=airflow

15

AIRFLOW_CTX_DAG_ID=init_ocp

16

AIRFLOW_CTX_TASK_ID=create_tenant

17

AIRFLOW_CTX_EXECUTION_DATE=2023-12-12T02:12:19.577416+00:00

18

AIRFLOW_CTX_TRY_NUMBER=1

19

AIRFLOW_CTX_DAG_RUN_ID=manual__2023-12-12T02:12:19.577416+00:00

20

[2023-12-12T10:12:21.879+0800] INFO - use metadb connection

21

[2023-12-12T10:12:21.880+0800] INFO - Running statement: select a.id, a.ip, a.hardware, b.name as idc, b.region from oat_server a, oat_idc b where a.idc_id=b.id and a.id in (%s), parameters: [1]

22

[2023-12-12T10:12:21.881+0800] INFO - Rows affected: 1

23

[2023-12-12T10:12:21.883+0800] INFO - Execute query: select distinct(zone) from __all_server order by zone, args: None

24

[2023-12-12T10:12:21.884+0800] INFO - Execute rows: 1

25

[2023-12-12T10:12:21.884+0800] INFO - Execute query: select info from __all_zone where zone=%s and name=‘region’, args: (‘META_ZONE_1’,)

26

[2023-12-12T10:12:21.885+0800] INFO - Execute rows: 1

27

[2023-12-12T10:12:21.885+0800] INFO - Execute query: select tenant_id from __all_resource_pool where name=%s, args: (‘xw_ocp_resource_pool’,)

28

[2023-12-12T10:12:21.886+0800] INFO - Execute rows: 0

29

[2023-12-12T10:12:21.886+0800] INFO - Execute query: select unit_config_id from __all_unit_config where name=%s, args: (‘xw_ocp_unit’,)

30

[2023-12-12T10:12:21.887+0800] INFO - Execute rows: 1

31

[2023-12-12T10:12:21.887+0800] INFO - Execute query: select name from __all_resource_pool where unit_config_id=%s, args: (1003,)

32

[2023-12-12T10:12:21.887+0800] INFO - Execute rows: 0

33

[2023-12-12T10:12:21.888+0800] INFO - Execute query: drop resource unit xw_ocp_unit, args: None

34

[2023-12-12T10:12:21.890+0800] INFO - Execute rows: 0

35

[2023-12-12T10:12:21.890+0800] INFO - Execute query: CREATE RESOURCE UNIT IF NOT EXISTS xw_ocp_unit MAX_CPU 2, MAX_MEMORY ‘3G’, MAX_IOPS 128 ,MAX_DISK_SIZE ‘1G’, MAX_SESSION_NUM 10000, args: None

36

[2023-12-12T10:12:21.894+0800] INFO - Execute rows: 0

37

[2023-12-12T10:12:21.894+0800] INFO - Execute query: CREATE RESOURCE POOL IF NOT EXISTS xw_ocp_resource_pool UNIT=‘xw_ocp_unit’, UNIT_NUM=1, ZONE_LIST=(‘META_ZONE_1’), args: None

38

[2023-12-12T10:12:21.898+0800] ERROR - Task failed with exception

39

Traceback (most recent call last):

40

File “/usr/local/lib/python3.9/site-packages/airflow/decorators/base.py”, line 217, in execute

41

return_value = super().execute(context)

42

File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 175, in execute

43

return_value = self.execute_callable()

44

File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 192, in execute_callable

45

return self.python_callable(*self.op_args, **self.op_kwargs)

46

File “/oat/task_engine/dags/init_ocp.py”, line 54, in create_tenant

47

common.create_tenant(ctx, logger, product=‘ocp’)

48

File “/oat/task_engine/plugins/common.py”, line 731, in create_tenant

49

cur.execute(sql)

50

File “/oat/task_engine/plugins/utils.py”, line 1612, in execute

51

res = super().execute(query, args)

52

File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 148, in execute

53

result = self._query(query)

54

File “/usr/local/lib/python3.9/site-packages/pymysql/cursors.py”, line 310, in _query

55

conn.query(q)

56

File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 548, in query

57

self._affected_rows = self._read_query_result(unbuffered=unbuffered)

58

File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 775, in _read_query_result

59

result.read()

60

File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 1156, in read

61

first_packet = self.connection._read_packet()

62

File “/usr/local/lib/python3.9/site-packages/pymysql/connections.py”, line 725, in _read_packet

63

packet.raise_for_error()

64

File “/usr/local/lib/python3.9/site-packages/pymysql/protocol.py”, line 221, in raise_for_error

65

err.raise_mysql_exception(self._data)

66

File “/usr/local/lib/python3.9/site-packages/pymysql/err.py”, line 143, in raise_mysql_exception

67

raise errorclass(errno, errval)

68

pymysql.err.OperationalError: (4624, ‘machine resource is not enough to hold a new unit’)

69

[2023-12-12T10:12:21.908+0800] INFO - Marking task as FAILED. dag_id=init_ocp, task_id=create_tenant, execution_date=20231212T021219, start_date=20231212T021221, end_date=20231212T021221

70

[2023-12-12T10:12:21.909+0800] INFO - Running statement: update oat_audit set status=‘failed’, update_time=utc_timestamp(), failed_reason=%s where id=%s, parameters: [“failed task instance is init_ocp__create_tenant__20231212 and exception information is (4624, ‘machine resource is not enough to hold a new unit’)”, 14]

71

[2023-12-12T10:12:21.909+0800] INFO - Rows affected: 1

72

[2023-12-12T10:12:21.922+0800] ERROR - Failed to execute job 44 for task create_tenant ((4624, ‘machine resource is not enough to hold a new unit’); 18229)

73

[2023-12-12T10:12:21.945+0800] INFO - Task exited with return code 1

74

[2023-12-12T10:12:21.994+0800] INFO - 0 downstream tasks scheduled from follow-on schedule chec

创建资源池失败,machine resource is not enough to hold a new unit ,申请的资源超出了限制。无法根据报错确定具体是哪个资源不足,可以对照下剩余资源和申请的 xw_ocp_unit配置里信息。

请问下是运行内存不足,还是磁盘不足啊?

select zone,concat(SVR_IP,’:’,SVR_PORT) observer,
cpu_capacity_max cpu_total,cpu_assigned_max cpu_assigned,
cpu_capacity-cpu_assigned_max as cpu_free,
round(memory_limit/1024/1024/1024,2) as memory_total,
round((memory_limit-mem_capacity)/1024/1024/1024,2) as system_memory,
round(mem_assigned/1024/1024/1024,2) as mem_assigned,
round((mem_capacity-mem_assigned)/1024/1024/1024,2) as memory_free,
round(log_disk_capacity/1024/1024/1024,2) as log_disk_capacity,
round(log_disk_assigned/1024/1024/1024,2) as log_disk_assigned,
round((log_disk_capacity-log_disk_assigned)/1024/1024/1024,2) as log_disk_free,
round((data_disk_capacity/1024/1024/1024),2) as data_disk,
round((data_disk_in_use/1024/1024/1024),2) as data_disk_used,
round((data_disk_capacity-data_disk_in_use)/1024/1024/1024,2) as data_disk_free
from gv$ob_servers;
sys租户下查下这个sql,看下free,再对比你创建的租户,就可以看出来了

都没有 gv$ob_servers 这个视图啊

SELECT a.svr_ip, a.svr_port, a.zone,b.status, cpu_total, cpu_assigned, round(mem_total / 1024 / 1024 / 1024, 2) AS mem_total_gb, round(mem_assigned / 1024 / 1024 / 1024, 2) AS mem_assign_gb, round((mem_total - mem_assigned) / 1024 / 1024 / 1024, 2) AS mem_free_gb, round(c.total_size / 1024 / 1024 / 1024, 2) AS disk_total_gb, round(c.used_size / 1024 / 1024 / 1024, 2) AS disk_assign_gb, round((c.total_size - c.used_size) / 1024 / 1024 / 1024, 2) AS disk_free_gbFROM oceanbase.__all_virtual_server_stat aJOIN oceanbase.__all_server bON a.svr_ip = b.svr_ipAND a.svr_port = b.svr_portJOIN __all_virtual_disk_stat cON c.svr_ip = b.svr_ipAND c.svr_port = b.svr_portORDER BY a.zone, a.svr_ip\G; 这个试试

SELECT
a.svr_ip
,a.svr_port
,a.zone
,b.status
,cpu_total
,cpu_assigned
,round(mem_total / 1024 / 1024 / 1024, 2) AS mem_total_gb
,round(mem_assigned / 1024 / 1024 / 1024, 2) AS mem_assign_gb
,round((mem_total - mem_assigned) / 1024 / 1024 / 1024, 2) AS mem_free_gb
,round(c.total_size / 1024 / 1024 / 1024, 2) AS disk_total_gb
,round((c.total_size - c.free_size) / 1024 / 1024 / 1024, 2) AS disk_assign_gb
,round((c.free_size) / 1024 / 1024 / 1024, 2) AS disk_free_gb
FROM oceanbase.__all_virtual_server_stat a
JOIN oceanbase.__all_server b ON a.svr_ip = b.svr_ip AND a.svr_port = b.svr_port
JOIN __all_virtual_disk_stat c ON c.svr_ip = b.svr_ip AND c.svr_port = b.svr_port
ORDER BY a.zone, a.svr_ip

这个语句个得到的结果是: