麻烦像12楼一样,将ocp_monitor租户的情况也发下吧
log_disk_utilization_threshold 用于设置租户日志盘利用率阈值,当租户日志盘使用量超过租户日志盘空间总量乘以该值时,进行日志文件重用。
参数默认80%
使用最大值始终保持在79点多
数据盘的使用百分比是多少呢?这里看不太清晰
0.7%上下
看ocp-server.log,这里11-29 10:10:00左右有inode的告警,monagent.log没有这段时间的日志,可以提供下吗
另外这个帖子的ocp 版本是4.3.0,和您的ocp版本一致吗?
2024-11-29 10:10:00.139 INFO 17860 --- [pool-ocp-async-8,b1c98cee2b4f49ac,563d1caeb8eb] c.o.ocp.alarm.core.task.AlarmDetectTask : alarm detect start
2024-11-29 10:10:00.141 INFO 17860 --- [alarm-detect-24,9af00fba678d4e1a,82ebf2c1830c] c.o.o.alarm.core.detect.MetricDetector : alarmType node_file_inode_usage detect time range 1732846170 ~ 1732846190 matched, alarm-level: 2, evaluate result MultiExpressionEvaluateResult(metricEvaluateResults={file_inode_usage > 80=[ExpressionEvaluateResult(matchValue=80.0, metric=file_inode_usage, match=true, value=97.0, values={1732846185=97.0}, valueCount=1, labels={app=HOST, host=1.1.1.69.56, host_name=EM-1XQW7R3, ob_cluster_id=1713323324, ob_cluster_name=core_cluster, obzone=zone2, svr_ip=1.1.1.69.56})]}, matched=true, message=null, level=2)
2024-11-29 10:10:00.141 INFO 17860 --- [alarm-detect-24,9af00fba678d4e1a,82ebf2c1830c] c.o.o.a.c.detect.AlarmRuleDetectState : firstDetect, alarmType=node_file_inode_usage, target=alarm_template_id=0:host=1.1.1.69.56-2, currentState=Firing
2024-11-29 10:10:00.141 INFO 17860 --- [alarm-detect-24,9af00fba678d4e1a,82ebf2c1830c] c.o.o.a.core.detect.AlarmRuleDetector : detectByMetric done, alarmType=node_file_inode_usage, needSendingCount=1, detectAlarmsCount=1, matchedTargets=[alarm_template_id=0:host=1.1.1.69.56-2]
2024-11-29 10:10:00.146 INFO 17860 --- [pool-ocp-schedules-9,99dd002b0f044f5e,e881c73dd0f6] c.o.ocp.service.audit.AuditEventService : [refreshMethodToAuditEventMetaIdMap] refresh method to audit event meta map success
2024-11-29 10:10:00.164 INFO 17860 --- [alarm-detect-24,9af00fba678d4e1a,82ebf2c1830c] c.o.o.alarm.service.OcpAlarmServiceImpl : alarm silenced, alarmType=node_file_inode_usage, target=alarm_template_id=0:host=1.1.1.69.56
2024-11-29 10:10:00.168 INFO 17860 --- [alarm-detect-24,9af00fba678d4e1a,82ebf2c1830c] c.o.o.alarm.service.OcpAlarmServiceImpl : create alarm event Done, id=4000782
目前确认:这里的误告警是产品预期的行为,产品侧暂时没有比较好的解决方案,告警时可以看监控图表确认下是否这个使用率是否真正出现问题了