偶发性插入数据操作延迟高

改是可以改,但不建议用sys,ocp租户做测试,建一个也不麻烦,可以参照一下文档新建租户
OceanBase分布式数据库-海量数据 笔笔算数

建了新的租户,其规格如下:

oceanbase.DBA_OB_UNIT_CONFIGS
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+---------------------+---------------------+-------------+
| UNIT_CONFIG_ID | NAME            | CREATE_TIME                | MODIFY_TIME                | MAX_CPU | MIN_CPU | MEMORY_SIZE | LOG_DISK_SIZE | MAX_IOPS            | MIN_IOPS            | IOPS_WEIGHT |
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+---------------------+---------------------+-------------+
|              1 | sys_unit_config | 2024-07-28 20:03:06.165110 | 2024-07-28 20:03:06.165110 |       3 |       3 |  2147483648 |    4294967296 | 9223372036854775807 | 9223372036854775807 |           3 |
|           1001 | ocp_meta_unit   | 2024-07-28 20:03:34.169029 | 2024-07-28 20:03:34.169029 |       1 |       1 |  2147483648 |    6442450944 | 9223372036854775807 | 9223372036854775807 |           1 |
|           1002 | gm_unit_config  | 2024-07-29 16:19:25.823224 | 2024-07-29 16:19:25.823224 |       8 |       8 | 17179869184 |   51539607552 | 9223372036854775807 | 9223372036854775807 |           8 |
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+---------------------+---------------------+-------------+
+------------------+---------------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
| RESOURCE_POOL_ID | NAME          | TENANT_ID | CREATE_TIME                | MODIFY_TIME                | UNIT_COUNT | UNIT_CONFIG_ID | ZONE_LIST | REPLICA_TYPE |
+------------------+---------------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
|                1 | sys_pool      |         1 | 2024-07-28 20:03:06.175218 | 2024-07-28 20:03:06.204417 |          1 |              1 | zone1     | FULL         |
|             1001 | ocp_meta_pool |      1002 | 2024-07-28 20:03:34.182023 | 2024-07-28 20:03:34.247615 |          1 |           1001 | zone1     | FULL         |
|             1002 | gm_pool_01    |      1004 | 2024-07-29 16:22:58.200163 | 2024-07-29 16:26:07.999457 |          1 |           1002 | zone1     | FULL         |
+------------------+---------------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
+-----------+-------------+-------------+----------------------------+----------------------------+--------------+---------------+-------------------+--------------------+--------+---------------+--------+-------------+-------------------+------------------+---------------------+---------------------+---------------------+---------------------+--------------+----------------------------+----------+------------+-----------+
|         1 | sys         | SYS         | 2024-07-28 20:03:06.261809 | 2024-07-28 20:03:06.261809 | RANDOM       | FULL{1}@zone1 | NULL              | MYSQL              | NORMAL | NO            | NO     | PRIMARY     | NORMAL            |                0 |                NULL |                NULL |                NULL |                NULL | NOARCHIVELOG | DISABLED                   |        1 | 4.3.1.0    |         1 |
|      1001 | META$1002   | META        | 2024-07-28 20:03:34.230599 | 2024-07-28 20:03:53.065203 | RANDOM       | FULL{1}@zone1 | NULL              | MYSQL              | NORMAL | NO            | NO     | PRIMARY     | NORMAL            |                0 |                NULL |                NULL |                NULL |                NULL | NOARCHIVELOG | DISABLED                   |        1 | 4.3.1.0    |         1 |
|      1002 | ocp_meta    | USER        | 2024-07-28 20:03:34.235135 | 2024-07-28 20:03:53.141405 | RANDOM       | FULL{1}@zone1 | NULL              | MYSQL              | NORMAL | NO            | NO     | PRIMARY     | NORMAL            |                0 | 1722243123130388001 | 1722243123130388001 | 1722243121878384001 | 4611686018427387903 | NOARCHIVELOG | DISABLED                   |        1 | 4.3.1.0    |      1001 |
|      1003 | META$1004   | META        | 2024-07-29 16:26:07.017102 | 2024-07-29 16:27:04.749718 | zone1        | FULL{1}@zone1 | NULL              | MYSQL              | NORMAL | NO            | NO     | PRIMARY     | NORMAL            |                0 |                NULL |                NULL |                NULL |                NULL | NOARCHIVELOG | DISABLED                   |        1 | 4.3.1.0    |         1 |
|      1004 | gm          | USER        | 2024-07-29 16:26:07.095126 | 2024-07-29 16:27:04.838384 | zone1        | FULL{1}@zone1 | NULL              | MYSQL              | NORMAL | NO            | NO     | PRIMARY     | NORMAL            |                0 | 1722243123053675000 | 1722243123053675000 | 1722243121727215000 | 4611686018427387903 | NOARCHIVELOG | DISABLED                   |        1 | 4.3.1.0    |      1001 |
+-----------+-------------+-------------+----------------------------+----------------------------+--------------+---------------+-------------------+--------------------+--------+---------------+--------+-------------+-------------------+------------------+---------------------+---------------------+---------------------+---------------------+--------------+----------------------------+----------+------------+-----------+

使用了该租户后,还是会有延迟高的现象。请问,下一步,该怎么排查与解决呢。

测试看到tps下降的时候,看下observer.log日志
grep ‘report write throttle info’ observer.log 有下降时间点有过滤到日志不,

全局搜了下,没有找到observer.log,只有clog与slog

肯定有,在软件安装目录下,

你好,已经找到了日志文件。
重新开启了一次测试,还是有调用语句延迟的情况发生,但是使用下述语句没有过滤出结果。

tail -f observer.log |grep 'report write throttle info'

测试程序能发下不,还有表结构,我测试下

现在没有独立出测试程序。项目软件比较大,而且依赖硬件环境,你那边运行不了。可以钉钉远程吗,我的钉钉号码6rf_xjnw0e4ix

:flushed:工作时间不大方便,你可以使用obdiag巡检一下

在线分析最近一小时的日志,诊断出出现过的错误

obdiag analyze log --since 1h

集群基础信息采集

obdiag gather scene run --scene=observer.base

推荐可以使用obdiag进行排查

sysbench 压测ob时, write only 场景 qps 波动大 - 社区问答- OceanBase社区-分布式数据库
也可以参考下这个帖子

通过以下语句都查出了多条记录,会有问题吗

select * from gv$session_event where EVENT='memstore memory page alloc wait';

select * from gv$session_event where EVENT='latch: bandwidth throttle lock wait';

select * from gv$session_event where EVENT='inner connection pool condition wait';

可以使用obdiag 生成ash报告 更容易去查看目前数据库性能及sql性能

obdiag gather ash命令

使用obdiag gather all收集了全部日志。
ob_log_127.0.0.1_20240730162000_20240730165000.rar (8.4 MB)
obproxy_log_127.0.0.1_20240730162000_20240730165000.zip (161.5 KB)
perf_127.0.0.1_20240730165236.zip (10.6 KB)
sysstat_127.0.0.1_20240730165224.zip (24.6 KB)

在以下时间点都出现延迟较高的情况:

2024-07-30 16:47:19左右
2024-07-30 16:46:47左右
2024-07-30 16:46:11左右
2024-07-30 16:45:38左右

在产生memstore memory page alloc wait事件时,observer.log日志中,没有过滤到report write throttle info。

另外发现在调用产生延迟时,磁盘的写入有突然升高。
通过iostat监控效果如下

我用sysbench压测了一下,有MEDIUM_MERGE合并的时候,对TPS影响比较大些,

MEDIUM_MERGE会很频繁吗?我这边差不多半分钟就会出现一次sql操作延时高的情况。
我的租户是8C 16G

我部署的是单zone,但有两个observer,所以租户资源是,2C/4G * 2,
用sysbench, 16线程 ,10张1000000的表,只是INSERT, 压测10分钟,会出现几次


你根据这个表查一下 MEDIUM_MERGE信息
SELECT * FROM oceanbase.GV$OB_TABLET_COMPACTION_HISTORY
WHERE tenant_id= &tenant_id and table_id= &table_id order by START_TIME desc limit 50;

能查到结果,但是看starttime都是凌晨2点左右。

这个正常的 这个和设置自动转储有关系 你到时候可以根据插入慢的start_time的时间范围 查询一下 看看有没有MEDIUM_MERGE信息

SELECT SVR_IP,SVR_PORT,TYPE,COMPACTION_SCN,START_TIME,FINISH_TIME FROM oceanbase.GV$OB_TABLET_COMPACTION_HISTORY where start_time > ‘时间点’ order by START_TIME desc;
也可以根据这个sql判断下,当qps陡降的时候,是不是触发了MINI_MINOR_MERGE等操作。
然后看下show parameters like “%memstore_limit_percentage%”; 这个参数大小