使用OAT-4.1.0添加服务器初始化报错

【 使用环境 】生产环境 or 测试环境
测试环境
【 OB or 其他组件 】
使用OAT4.1.0部署企业版
【 使用版本 】
OAT4.1.0
【问题描述】清晰明确描述问题
使用OAT4.1.0添加服务器,初始化报错,报错信息如下:
[2023-06-03T09:59:30.124+0800] INFO - check RPM: iproute-4.11.0-25.el7_7.2.x86_64 is installed … PASS
[2023-06-03T09:59:30.134+0800] INFO - check mysql client, working … PASS
[2023-06-03T09:59:30.147+0800] INFO - checking irq affinity …
[2023-06-03T09:59:30.165+0800] INFO - checking eth0 …
[2023-06-03T09:59:30.171+0800] INFO - check irq channels, NIC: eth0, Channel Combined: 8 … PASS
[2023-06-03T09:59:30.213+0800] INFO - check irq affinity, NIC: eth0, smp_affinity count: 4 … PASS
[2023-06-03T09:59:30.214+0800] INFO - checking eth1 …
[2023-06-03T09:59:30.225+0800] INFO - check irq channels, NIC: eth1, Channel Combined: 8 … PASS
[2023-06-03T09:59:30.271+0800] INFO - check irq affinity, NIC: eth1, smp_affinity count: 4 … PASS
[2023-06-03T09:59:30.279+0800] INFO - check irqbalance status: unknown … PASS
[2023-06-03T09:59:30.280+0800] INFO - check irqbalance service: disabled … PASS
[2023-06-03T09:59:30.281+0800] INFO - df: ‘/data/1’: No such file or directory
[2023-06-03T09:59:30.319+0800] INFO -
[2023-06-03T09:59:30.319+0800] INFO -
[2023-06-03T09:59:30.319+0800] INFO - ### SUMMARY OF ISSUES IN PRE-CHECK ###
[2023-06-03T09:59:30.320+0800] INFO - check CPU count: 8 < 32 … EXPECT >= 32 … FAIL
[2023-06-03T09:59:30.320+0800] INFO - TIPS: replace another machine with more CPU
[2023-06-03T09:59:30.320+0800] INFO - check total MEM: 15 GB < 128 GB … EXPECT >= 128 GB … FAIL
[2023-06-03T09:59:30.320+0800] INFO - TIPS: replace another machine with more MEM
[2023-06-03T09:59:30.323+0800] INFO - execute command on 10.169.1.73:
rm -f /tmp/precheck.sh7tXuMHF2
[2023-06-03T09:59:30.424+0800] ERROR - Task failed with exception
Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/airflow/decorators/base.py”, line 217, in execute
return_value = super().execute(context)
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 175, in execute
return_value = self.execute_callable()
File “/usr/local/lib/python3.9/site-packages/airflow/operators/python.py”, line 192, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File “/oat/task_engine/dags/init_server_with_tag.py”, line 79, in precheck
common.server_precheck(ctx, logger=logger)
File “/oat/task_engine/plugins/common.py”, line 1542, in server_precheck
raise RuntimeError(‘server precheck failed, please see the summary info above for details’)
RuntimeError: server precheck failed, please see the summary info above for details
[2023-06-03T09:59:30.433+0800] INFO - Marking task as FAILED. dag_id=init_server_with_tag, task_id=precheck, execution_date=20230603T013011, start_date=20230603T015920, end_date=20230603T015930
[2023-06-03T09:59:30.435+0800] INFO - Running statement: update oat_audit set status=‘failed’, update_time=utc_timestamp(), failed_reason=%s where id=%s, parameters: [‘failed task instance is init_server_with_tag__precheck__20230603 and exception information is server precheck failed, please see the summary info above for details’, 30]
[2023-06-03T09:59:30.436+0800] INFO - Rows affected: 1
[2023-06-03T09:59:30.458+0800] ERROR - Failed to execute job 207 for task precheck (server precheck failed, please see the summary info above for details; 31995)
[2023-06-03T09:59:30.501+0800] INFO - Task exited with return code 1
[2023-06-03T09:59:30.525+0800] INFO - 0 downstream tasks scheduled from follow-on schedule check

【复现路径】问题出现前后相关操作
怀疑是Python版本问题,把Python版本升级导3.9.16版本,仍然不能解决问题,报错信息一致

【问题现象及影响】

【附件】

1 个赞

可以咨询下对接的商业技术同学哈

大佬,问题解决了吗

看样子就是CPU、memory 资源不足

遇到同样的问题,是由于cpu 内存不足导致,后续在创建MetaDB时最低要求也需要8C16G。如果是测试环境且配置达到8C16G,可在初始化过程中precheck任务失败之后手动选择“设置为成功”(在任务项的后面三个小点处)。