ocp监控中租户qps 2万+ 但TPS一直不超过3

这个算出来两个点的tps分别是688和543,如果监控显示个位数可能有问题,我联系OCP这块的老师看下

是的监控上的TPS,一直没有超过5

1 个赞

麻烦登陆这个租户,再这样查下,将如下SQL同时粘贴到执行窗口,回车,发下结果

select now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
select sleep(5);
select now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
select sleep(5);
select now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
select now();
1 个赞

30009 事物回滚这个算出来,倒是和监控对的上的

MySQL [oceanbase]> select now();
ant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
select sleep(5);
select now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat whe+---------------------+
| now()               |
+---------------------+
| 2025-03-13 15:06:06 |
+---------------------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
re stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
select now();+-----------+---------+---------+
| tenant_id | stat_id | value   |
+-----------+---------+---------+
|      1012 |   30007 | 1478789 |
|      1012 |   30009 |  282236 |
|      1012 |   30011 |       0 |
+-----------+---------+---------+
3 rows in set (0.02 sec)

MySQL [oceanbase]> select sleep(5);
+----------+
| sleep(5) |
+----------+
|        0 |
+----------+
1 row in set (5.00 sec)

MySQL [oceanbase]> select now();
+---------------------+
| now()               |
+---------------------+
| 2025-03-13 15:06:11 |
+---------------------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
+-----------+---------+---------+
| tenant_id | stat_id | value   |
+-----------+---------+---------+
|      1012 |   30007 | 1478804 |
|      1012 |   30009 |  282260 |
|      1012 |   30011 |       0 |
+-----------+---------+---------+
3 rows in set (0.02 sec)

MySQL [oceanbase]> select sleep(5);
+----------+
| sleep(5) |
+----------+
|        0 |
+----------+
1 row in set (5.00 sec)

MySQL [oceanbase]> select now();
+---------------------+
| now()               |
+---------------------+
| 2025-03-13 15:06:16 |
+---------------------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
+-----------+---------+---------+
| tenant_id | stat_id | value   |
+-----------+---------+---------+
|      1012 |   30007 | 1478814 |
|      1012 |   30009 |  282280 |
|      1012 |   30011 |       0 |
+-----------+---------+---------+
3 rows in set (0.02 sec)

MySQL [oceanbase]> select now();
+---------------------+
| now()               |
+---------------------+
| 2025-03-13 15:06:18 |
+---------------------+
1 row in set (0.00 sec)

MySQL [oceanbase]> 

1 个赞

是的,对得上的,OCP这里TPS是按照30007,30009,30011计算的,没有问题的

1 个赞

QPS都到2万了,那这个TPS 肯定不止这么点的,还是有问题呀,一样的业务,之前我们压测的时候QPS 2000+ TPS 一直稳定在400左右
详见这个贴子:
OCP监控中发现租户里 事物回滚 比提交多几倍 - 社区问答- OceanBase社区-分布式数据库

在ocp SQL诊断中 BEGIN 和 COMMIT 的执行也有每秒300以上的量

另外,如果这个TPS 是对的,那又是一个回滚事物比提交事物多的现象,需要如何进一步排查?

1 个赞

更新
obdiag analyze log --since 1h
结果正常:


Analyze OceanBase Online Log Summary:

+--------------+----------+------------+--------------------+-------------+-----------+---------+
| Node         | Status   | FileName   | First Found Time   | ErrorCode   | Message   | Count   |
+==============+==========+============+====================+=============+===========+=========+
| 10.10.17.204 | PASS     |            |                    |             |           |         |
+--------------+----------+------------+--------------------+-------------+-----------+---------+
| 10.10.17.205 | PASS     |            |                    |             |           |         |
+--------------+----------+------------+--------------------+-------------+-----------+---------+
| 10.10.17.206 | PASS     |            |                    |             |           |         |
+--------------+----------+------------+--------------------+-------------+-----------+---------+
For more details, please run cmd ' cat /root/obdiag_analyze_pack_20250313152820/result_details.txt '

Trace ID: bc8693ba-ffdc-11ef-9201-f8f21e597991
If you want to view detailed obdiag logs, please run: obdiag display-trace bc8693ba-ffdc-11ef-9201-f8f21e597991
[root@OB01 ~]# cat /root/obdiag_analyze_pack_20250313152820/result_details.txt 

Analyze OceanBase Online Log Summary:
+--------------+----------+------------+--------------------+-------------+-----------+---------+
| Node         | Status   | FileName   | First Found Time   | ErrorCode   | Message   | Count   |
+==============+==========+============+====================+=============+===========+=========+
| 10.10.17.204 | PASS     |            |                    |             |           |         |
+--------------+----------+------------+--------------------+-------------+-----------+---------+
| 10.10.17.205 | PASS     |            |                    |             |           |         |
+--------------+----------+------------+--------------------+-------------+-----------+---------+
| 10.10.17.206 | PASS     |            |                    |             |           |         |
+--------------+----------+------------+--------------------+-------------+-----------+---------+

Details:

Node: 10.10.17.204
Status: PASS
FileName: None
First Found Time: None
ErrorCode: None
Message: None
Count: None
Last Found Time: None
Cause: None
Solution: None
Trace_IDS: None

Node: 10.10.17.205
Status: PASS
FileName: None
First Found Time: None
ErrorCode: None
Message: None
Count: None
Last Found Time: None
Cause: None
Solution: None
Trace_IDS: None

Node: 10.10.17.206
Status: PASS
FileName: None
First Found Time: None
ErrorCode: None
Message: None
Count: None
Last Found Time: None
Cause: None
Solution: None
Trace_IDS: None

[root@OB01 ~]# 

另外下面是 obdiag check run 巡检结果:

[root@OB01 ~]# obdiag check run
check start ...
[WARN] step_base ResultFalseException:mod max memory over 10G,Please check on oceanbase.__all_virtual_memory_info to find some large mod 
[WARN] step_base ResultFalseException:mod max memory over 10G,Please check on oceanbase.__all_virtual_memory_info to find some large mod 
[WARN] TaskBase execute StepResultFalseException: mod max memory over 10G,Please check on oceanbase.__all_virtual_memory_info to find some large mod   .
[WARN] TaskBase execute StepResultFalseException: mod max memory over 10G,Please check on oceanbase.__all_virtual_memory_info to find some large mod   .
[WARN] step_base ResultFalseException:mod max memory over 10G,Please check on oceanbase.__all_virtual_memory_info to find some large mod 
[WARN] TaskBase execute StepResultFalseException: mod max memory over 10G,Please check on oceanbase.__all_virtual_memory_info to find some large mod   .
[WARN] step_base ResultFalseException:number of sql_error_4012 is 154 
[WARN] step_base ResultFalseException:number of sql_error_4012 is 154 
[WARN] TaskBase execute StepResultFalseException: number of sql_error_4012 is 154   .
[WARN] TaskBase execute StepResultFalseException: number of sql_error_4012 is 154   .
[WARN] step_base ResultFalseException:number of sql_error_4012 is 154 
[WARN] TaskBase execute StepResultFalseException: number of sql_error_4012 is 154   .
[WARN] step_base ResultFalseException:tsar is not installed. we can not check tcp retransmission. 
[WARN] TaskBase execute StepResultFailException: tsar is not installed. we can not check tcp retransmission.  
[WARN] step_base ResultFalseException:tsar is not installed. we can not check tcp retransmission. 
[WARN] step_base ResultFalseException:tsar is not installed. we can not check tcp retransmission. 
[WARN] TaskBase execute StepResultFailException: tsar is not installed. we can not check tcp retransmission.  
[WARN] TaskBase execute StepResultFailException: tsar is not installed. we can not check tcp retransmission.  
[WARN] network_speed is  and the type is <class 'str'>, not int or float or decimal ! set it to 0.
[WARN] step_base ResultFalseException:network_speed is  , less than  
[WARN] network_speed is  and the type is <class 'str'>, not int or float or decimal ! set it to 0.
[WARN] step_base ResultFalseException:network_speed is  , less than  
[WARN] TaskBase execute StepResultFailException: network_speed is  , less than   
[WARN] TaskBase execute StepResultFailException: network_speed is  , less than   
[WARN] step_base ResultFalseException:net.ipv4.tcp_tw_recycle : 0. recommended: 1. 
[WARN] TaskBase execute StepResultFalseException: net.ipv4.tcp_tw_recycle : 0. recommended: 1.   .
[WARN] step_base ResultFalseException:net.ipv4.tcp_tw_recycle : 0. recommended: 1. 
[WARN] step_base ResultFalseException:net.ipv4.tcp_tw_recycle : 0. recommended: 1. 
[WARN] TaskBase execute StepResultFalseException: net.ipv4.tcp_tw_recycle : 0. recommended: 1.   .
[WARN] TaskBase execute StepResultFalseException: net.ipv4.tcp_tw_recycle : 0. recommended: 1.   .
Check observer finished. For more details, please run cmd' cat ./check_report/obdiag_check_report_observer_2025-03-13-15-48-53.table '
Trace ID: 9b29f77c-ffdf-11ef-b508-f8f21e597991
If you want to view detailed obdiag logs, please run: obdiag display-trace 9b29f77c-ffdf-11ef-b508-f8f21e597991

上面的 TaskBase execute StepResultFalseException: number of sql_error_4012 is 154 这个需要关注吗?

1 个赞

在ocp SQL诊断中 BEGIN 和 COMMIT 的执行也有每秒300以上的量 --这个麻烦截图看下,

那又是一个回滚事物比提交事物多的现象 --这个算出来 回滚事物是比提交事物略高,这个问题我再看下

1 个赞

然后再麻烦发下 3月12日 21:00:00~ 21:00:05的 TPS监控截图

性能监控选全部zone和全部observer,SQL诊断选全部observer

3月12日 21:00:00~ 21:00:05的截图看下呢

是一样的,我们所有SQL都只在这一个zone 上执行的


好的,我再看下

麻烦执行下这个sql查下看看
select ACTION,count(*) from from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip=’’ group by action;

看不透

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        7 |
| COMMIT |        1 |
+--------+----------+
2 rows in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        9 |
+--------+----------+
1 row in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        4 |
+--------+----------+
1 row in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        3 |
+--------+----------+
1 row in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        4 |
+--------+----------+
1 row in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        7 |
+--------+----------+
1 row in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |       11 |
+--------+----------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        5 |
+--------+----------+
1 row in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        2 |
+--------+----------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        4 |
| COMMIT |        1 |
+--------+----------+
2 rows in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        7 |
+--------+----------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| COMMIT |        1 |
| START  |        7 |
+--------+----------+
2 rows in set (0.00 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        3 |
+--------+----------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        3 |
+--------+----------+
1 row in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        3 |
+--------+----------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        2 |
+--------+----------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        1 |
+--------+----------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        4 |
+--------+----------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        1 |
+--------+----------+
1 row in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        1 |
+--------+----------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        6 |
| COMMIT |        1 |
+--------+----------+
2 rows in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        7 |
+--------+----------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        6 |
| COMMIT |        1 |
+--------+----------+
2 rows in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        8 |
+--------+----------+
1 row in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        6 |
+--------+----------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        7 |
+--------+----------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| COMMIT |        1 |
| START  |        3 |
+--------+----------+
2 rows in set (0.01 sec)

MySQL [oceanbase]> select ACTION,count(*)  from GV$OB_TRANSACTION_PARTICIPANTS where svr_ip='10.10.17.204' group by action;
+--------+----------+
| ACTION | count(*) |
+--------+----------+
| START  |        4 |
+--------+----------+
1 row in set (0.01 sec)

学到了!

你好麻烦通过ocp的sql诊断提供一下回滚事务的trace id。再根据时间提供一下当时的observer日志

请教一下 ocp sql 诊断如何看回滚事物的trace id ?

倒是有个 事物诊断 但里面也只有【其它事务】 里会有信息,但这些事务也不是一直在的,过一会就不见了。