新版OCP4.3.3平台里缺少TopSQL&SlowSQL&磁盘使用率统计

问题

业务使用测试环境在1.10号升级OB版本从4.2.1.0到4.2.5.1后,OCP管理平台里单独新建的租户ToPSQL,SlowSQL,可疑SQL,磁盘使用量统计无数据,性能监控里有上千读写QPS,之前老版本OCP4.2这些是都有数据。

#ocp版本通过obd升级使用图形化界面升级 OCP-V4.3.3-OceanBase 云平台OCP文档-分布式数据库使用文档

obd web upgrade #页面方式升级

#observer版本分为两次通过obd升级使用 OBD 升级 OceanBase 数据库-V4.2.1-OceanBase 数据库文档-分布式数据库使用文档

obd cluster upgrade obtest42 -c oceanbase-ce -V 4.2.1.10 --usable=f7c2609f1591be5ce420e2333bc26ad701a2396616309fb67b289f23b1d1fdb0

obd cluster upgrade obtest42 -c oceanbase-ce -V 4.2.5.1 --usable=dd8807d53c7c2561a232a61090f44366432ab10929a74d57f18191b286453ab6

ocp无topsql数据

image-20250114142120874

版本

OCP
版本号: 4.3.3-20241203200857

发布日期: 2024年12月3日

curl --user ocp_user:ocp_password ‘http://xxxx:8080/api/v2/info
{“artifact”:“ocp-server-ce”,“buildTime”:“2024-12-03T20:25:10.163+08:00”,“buildVersion”:“4.3.3-20241203200857”,“commitBranch”:“3c7d3d573fb36d8c2d2cf76ccb48ef3b51dd9b0e”,“commitId”:“3c7d3d5”,“commitTime”:“2024-12-03T02:14:08Z”,“edition”:“COMMUNITY”,“group”:“com.oceanbase”}

OBServer

observer --version
observer (OceanBase_CE 4.2.5.1)

REVISION: 101000092024120918-6388bc0561faecba5f75a662c6e11c3dd0598de9

OBProxy

obproxy --version
obproxy (OceanBase 4.2.1.0 11)
REVISION: 1-local-6599462fc897a4a46734d64585906ea80975a656

日志

OCP通过F12查看TopSQL请求url,通过traceId去ocp日志里查询无异常,是否是新版4.3.3 OCP自身查询方式发生了变化?官方文档里看4.3.3 OCP是兼容OB4.2.X的

image-20250114142820957

xxxxxxx.dmall.com/api/v2/ob/clusters/1/tenants/1000002/topSql?startTime=2025-01-14T10%3A56%3A00%2B08%3A00&endTime=2025-01-14T11%3A26%3A00%2B08%3A00&inner=false

{
“data”: {
“contents”: []
},
“duration”: 11,
“server”: “0b1a3d3227”,
“status”: 200,
“successful”: true,
“timestamp”: “2025-01-14T13:46:26.97+08:00”,
“traceId”: “d460602460035373”
}

通过traceId查询ocp日志

less /home/admin/logs/ocp-server.log

2025-01-14 13:46:26.966  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.s.c.trace.RequestTracingAspect     : API: [GET /api/v2/ob/cl
usters/1/tenants/1000002/topSql?startTime=2025-01-14T10%3A56%3A00%2B08%3A00&endTime=2025-01-14T11%3A26%3A00%2B08%3A00&inner=false, client=10.27.183.201, traceId=d460602460
035373, method=IterableResponse com.oceanbase.ocp.server.common.controller.perf.ObSqlStatController.topSql(Long,Long,OffsetDateTime,OffsetDateTime,Long,Boolean,String,Stri
ng,String,String,Long,RequestingSqlText,Integer,List,String,boolean,boolean), args=1,1000002,2025-01-14T10:56+08:00,2025-01-14T11:26+08:00,false,false,false,]
2025-01-14 13:46:26.967  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.p.s.i.SqlAuditRawStatServiceImpl   : Query sql audit with pa
ram:QueryTopSqlParam(interval=Interval{start=2025-01-14T10:56:00, end=2025-01-14T11:26:00}, clusterId=1, tenantId=1000002, serverId=null, inner=false, sqlText=null, search
=null, limit=null, requestingSqlText=ASSUME_HITS, sqlTextLength=null, customColumns=null, filterExpression=null, parseSqlType=false, parseStatement=false, parseTable=false
, groupByServer=true, serverIdList=null, sqlIds=null, mergeDynamicSql=false, dynamicSql=false, returnSystemSql=false)
2025-01-14 13:46:26.967  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.p.s.internal.util.QueryRangeUtil   : split by roll up progre
ss
2025-01-14 13:46:26.968  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.p.s.internal.util.QueryRangeUtil   : IntervalMap:{0=Interval
{start=2025-01-14T10:56:00, end=2025-01-14T11:26:00}}, Server roll up progress:{}, Specs:[StatSpec(type=SQL, level=0, enabled=true, granularity=PT30S, queryInterval=PT2H, 
retention=PT48H), StatSpec(type=SQL, level=1, enabled=true, granularity=PT2M, queryInterval=PT12H, retention=PT192H), StatSpec(type=SQL, level=2, enabled=true, granularity
=PT10M, queryInterval=PT48H, retention=PT360H), StatSpec(type=SQL, level=3, enabled=false, granularity=PT0S, queryInterval=PT0S, retention=PT0S)]
2025-01-14 13:46:26.968  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.p.s.internal.util.QueryRangeUtil   : level:0, StatSpec(type=
SQL, level=0, enabled=true, granularity=PT30S, queryInterval=PT2H, retention=PT48H), table of next level:ob_hist_sql_audit_stat_1, level1 minQueryRange:PT1H, level2 minQue
ryRange:PT6H
2025-01-14 13:46:26.968  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.p.s.internal.util.QueryRangeUtil   : After adjust level 0, I
ntervalMap:{0=Interval{start=2025-01-14T10:56:00, end=2025-01-14T11:26:00}}
2025-01-14 13:46:26.968  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.p.s.internal.util.QueryRangeUtil   : level:1, StatSpec(type=
SQL, level=1, enabled=true, granularity=PT2M, queryInterval=PT12H, retention=PT192H), table of next level:ob_hist_sql_audit_stat_2, level1 minQueryRange:PT1H, level2 minQu
eryRange:PT6H
2025-01-14 13:46:26.968  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.p.s.internal.util.QueryRangeUtil   : After adjust level 1, I
ntervalMap:{0=Interval{start=2025-01-14T10:56:00, end=2025-01-14T11:26:00}}
2025-01-14 13:46:26.968  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.p.s.internal.util.QueryRangeUtil   : Split query range to:{0
=Interval{start=2025-01-14T10:56:00, end=2025-01-14T11:26:00}}
2025-01-14 13:46:26.968  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.p.s.i.SqlAuditRawStatServiceImpl   : split query range to:{0
=Interval{start=2025-01-14T10:56:00, end=2025-01-14T11:26:00}}
2025-01-14 13:46:26.968  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.p.s.i.SqlAuditRawStatServiceImpl   : TopSql database request
:QuerySqlAuditRawStatGroupBy(timeout=30000000, level=0, obClusterId=1701865424, clusterName=obtest42, startTimeUs=1736823360000000, endTimeUs=1736825160000000, obTenantId=
1008, obServerId=null, obDbId=null, dbName=null, obServerIdList=null, sqlId=null, includeInner=false, limit=2000, sqlText=null, parallel=null, dynamicSql=false)
2025-01-14 13:46:26.970  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.p.s.i.SqlAuditRawStatServiceImpl   : Get sql info from cache
 and monitor db, result size:0
2025-01-14 13:46:26.970  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.p.s.i.SqlAuditRawStatServiceImpl   : set extra columns and c
pu percentage
2025-01-14 13:46:26.970  INFO 23974 --- [http-nio-0.0.0.0-8080-exec-7,d460602460035373,31ff6b716dcc15b3] c.o.o.s.c.trace.RequestTracingAspect     : API OK: [GET /api/v2/ob
/clusters/1/tenants/1000002/topSql client=10.27.183.201, traceId=d460602460035373, duration=11 ms]
1 个赞

麻烦确认下 ocp的元数据做了升级了吗?查下

select * from __all_rootservice_event_history;

2 个赞

在mgragent.log日志中搜一下monagent.pipeline.sql.audit.status

2 个赞
cat monagent_pipeline.yaml | grep  -A 3  audit.status
    - key: monagent.pipeline.cloud.raw.sql.audit.status
      value: inactive
      valueType: string
      encrypted: false
--
    - key: monagent.pipeline.sql.audit.status
      value: inactive
      valueType: string
      encrypted: false

是,升级过

感谢支持,目前有人在帮忙看这问题了,这里信息比较多
select count() from __all_rootservice_event_history;
±---------+
| count(
) |
±---------+
| 67961 |
±---------+
1 row in set (0.079 sec)

好的,我看到其它老师在看这个了。

你看看改成inactive的时间是不是你升级集群的时间,我之前在ocp433升级ob集群之后出现和你同样的问题,经过判断是这几个参数被inactive了,需要手动设置成inactive,在433 bp3还是bp4这个问题修复了

在mgragent.log日志中查询变更日期

在mgragent.log日志中查询变更日期