【 使用环境 】生产环境
【 OB or 其他组件 】OCP
【 使用版本 】4.4.0
【问题描述】ocp_monagent报警重启, 看了下日志是报错退出后被守护进程重新拉起,日志如下:
agentd.log看到monagent退出
2026-05-17T12:35:59.42749+08:00 WARN [41630,] caller=agentd/service.go:182:guard: service exited with code 2. service state: running fields: service=ocp_monagent
monagent退出时间附近的warn日志有如下2个
2026-05-17T12:35:59.30212+08:00 WARN [27432,c64b8094a78b8006] caller=host/custom.go:712:doCollectCoredumpTime: get observer coredump time failed, err: open : no such file or directory fields: coredump-path=
2026-05-17T12:35:59.44712+08:00 WARN [14285,] caller=config/yaml.go:99:validateNode: configs may not be replaced: [ocp.agent.http.ip ocp.agent.monitor.http.port]
ocp_monagent.error.log有如下日志,这段日志本身没有时间,我是根据linux的文件修改时间推测这个错误日志和这次重启相关:May 17 12:35 ocp_monagent.error.log
goroutine 1196 [running]:
internal/sync.(*HashTrieMap[…]).Load(0x1b43de0, {0x15ab280, 0xc009e82aa8})
/home/admin/go/src/internal/sync/hashtriemap.go:73 +0x8a
sync.(*Map).Load(…)
/home/admin/go/src/sync/hashtriemap.go:50
github.com/oceanbase/obagent/monitor/plugins/inputs/oceanbase.(*SqlAuditInput).getSampleSqlType(0xc000414500, 0xc009e836f8, 0x2963b40)
/workspace/code-repo/rpm/.rpm_create/SOURCES/ocp-agent-ce/monitor/plugins/inputs/oceanbase/sql_audit.go:1550 +0x14d
github.com/oceanbase/obagent/monitor/plugins/inputs/oceanbase.(*SqlAuditInput).parseSampleSqlData(0xc000414500, {0x1b1b300, 0xc021c26f50}, {0x187e7c0?, 0xc0220a3188?, 0x2965020?}, 0xc009e836f8, 0x3f2?, 0xc0025ace20, 0xc036767e00)
/workspace/code-repo/rpm/.rpm_create/SOURCES/ocp-agent-ce/monitor/plugins/inputs/oceanbase/sql_audit.go:1436 +0x5c
github.com/oceanbase/obagent/monitor/plugins/inputs/oceanbase.(*ObSqlAudit).parseRawSqlResults(0xc000d3b600, {0x1b1b300, 0xc021c26f50}, 0x2d8cb22ff, 0xc000789180, {0xc009e87ba0?, 0x4d8733?, 0x2965020?}, 0xc0025acdf0, 0xc0025ace20, …)
/workspace/code-repo/rpm/.rpm_create/SOURCES/ocp-agent-ce/monitor/plugins/inputs/oceanbase/sql_audit_merge.go:837 +0x1516
github.com/oceanbase/obagent/monitor/plugins/inputs/oceanbase.(*ObSqlAudit).collectRawMsgsByTenant(0xc000d3b600, {0x1b1b300, 0xc021c26f50}, 0x3f2, 0x2d8cb22ff, 0xc019e9b2c0, 0x0, 0xc0025acdf0, 0xc0025ace20, 0xc0025ace50, …)
/workspace/code-repo/rpm/.rpm_create/SOURCES/ocp-agent-ce/monitor/plugins/inputs/oceanbase/sql_audit_merge.go:493 +0x63a
github.com/oceanbase/obagent/monitor/plugins/inputs/oceanbase.(*ObSqlAudit).runTaskRound(0xc000d3b600, 0x2d8cb22ff, 0xc0025acdd0, 0xc0009632d0)
/workspace/code-repo/rpm/.rpm_create/SOURCES/ocp-agent-ce/monitor/plugins/inputs/oceanbase/sql_audit_merge.go:447 +0x137
github.com/oceanbase/obagent/monitor/plugins/inputs/oceanbase.(*ObSqlAudit).runTask(0xc000d3b600, 0xc0025acdd0, 0xc0009632d0)
/workspace/code-repo/rpm/.rpm_create/SOURCES/ocp-agent-ce/monitor/plugins/inputs/oceanbase/sql_audit_merge.go:433 +0x4ca
created by github.com/oceanbase/obagent/monitor/plugins/inputs/oceanbase.(*ObSqlAudit).addTask in goroutine 1185
/workspace/code-repo/rpm/.rpm_create/SOURCES/ocp-agent-ce/monitor/plugins/inputs/oceanbase/sql_audit_merge.go:374 +0x24b
问了ai分析说最后这个堆栈说明monagent碰到空指针了?重启是bug导致的吗