ocp监控中租户qps 2万+ 但TPS一直不超过3

【 使用环境 】生产环境 or 测试环境
【 OB or 其他组件 】
【 使用版本 】ocp 4.3.5 ob-ce 4.3.5
【问题描述】
ocp 监控看到租户QPS 一直在2万左右,但TPS一直不超过3,而且显示回滚的量经常比提交的量大


在ocp SQL诊断中 BEGIN 和 COMMIT 的执行也有每秒300次左右
请教一下,这个TPS监控显示的量是正常的吗? 该如何排查?

1 个赞

另外在日志中以 rollback 做关键字查询,只有以下几个重复出现:

[2025-03-12 14:32:49.835615] WDIAG [STORAGE.TRANS] sync_rollback_savepoint__ (ob_tx_api.cpp:1878) [153089][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D3DFDF6-0-0] [lt=151][errcode=-4012] tx rpc condition wakeup(ret=-4012, tx.tx_id_={txid:61850779}, waittime=10000, rpc_ret=0, expire_ts=1741761467746549, remain=[{ls_id:{id:1001}, exec_epoch:556963872692272, transfer_epoch:-1}], remain_cnt=1, retries=0, tx.state=4)
[2025-03-12 14:32:49.870020] WDIAG [STORAGE.TRANS] rollback_to_savepoint (ob_trans_part_ctx.cpp:8726) [153090][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D3DFE50-0-0] [lt=0][errcode=-4036] rollback_to need retry because of logging(ret=-4036, trans_id_={txid:61850824}, ls_id_={id:1001}, busy_cbs_.get_size()=1)
[2025-03-12 14:32:49.870043] WDIAG [STORAGE.TRANS] ls_sync_rollback_savepoint__ (ob_tx_api.cpp:1354) [153090][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D3DFE50-0-0] [lt=53][errcode=-4036] rollback to savepoint sync fail(ret=-4036, part_ctx->get_trans_id()={txid:61850824}, part_ctx->get_ls_id()={id:1001}, retry_cnt=0, op_sn=57, savepoint={branch:0, seq:19969}, expire_ts=-1)
[2025-03-12 14:32:49.870061] WDIAG [STORAGE.TRANS] ls_rollback_to_savepoint_ (ob_tx_api.cpp:1702) [153090][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D3DFE50-0-0] [lt=44][errcode=-4036] LS rollback to savepoint fail(ret=-4036, tx_id={txid:61850824}, ls={id:1001}, op_sn=57, savepoint={branch:0, seq:19969}, ctx={this:0x7f76460db2d0, ref:3, trans_id:{txid:61850824}, tenant_id:1012, is_exiting:false, trans_expired_time:1741847569848747, cluster_version:17180067072, trans_need_wait_wrap:{receive_gts_ts_:[mts=0], need_wait_interval_us:0}, stc:[mts=0], ctx_create_time:1741761169848747})
[2025-03-12 14:32:49.870220] WDIAG [STORAGE.TRANS] rollback_to_savepoint (ob_trans_part_ctx.cpp:8726) [153097][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D3DFE50-0-0] [lt=2][errcode=-4036] rollback_to need retry because of logging(ret=-4036, trans_id_={txid:61850824}, ls_id_={id:1001}, busy_cbs_.get_size()=1)
[2025-03-12 14:32:49.870236] WDIAG [STORAGE.TRANS] ls_sync_rollback_savepoint__ (ob_tx_api.cpp:1354) [153097][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D3DFE50-0-0] [lt=37][errcode=-4036] rollback to savepoint sync fail(ret=-4036, part_ctx->get_trans_id()={txid:61850824}, part_ctx->get_ls_id()={id:1001}, retry_cnt=0, op_sn=57, savepoint={branch:0, seq:19969}, expire_ts=-1)
[2025-03-12 14:32:49.870247] WDIAG [STORAGE.TRANS] ls_rollback_to_savepoint_ (ob_tx_api.cpp:1702) [153097][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D3DFE50-0-0] [lt=28][errcode=-4036] LS rollback to savepoint fail(ret=-4036, tx_id={txid:61850824}, ls={id:1001}, op_sn=57, savepoint={branch:0, seq:19969}, ctx={this:0x7f76460db2d0, ref:3, trans_id:{txid:61850824}, tenant_id:1012, is_exiting:false, trans_expired_time:1741847569848747, cluster_version:17180067072, trans_need_wait_wrap:{receive_gts_ts_:[mts=0], need_wait_interval_us:0}, stc:[mts=0], ctx_create_time:1741761169848747})
[2025-03-12 14:32:49.871802] WDIAG [STORAGE.TRANS] rollback_to_savepoint (ob_trans_part_ctx.cpp:8726) [153088][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D1E2244-0-0] [lt=0][errcode=-4036] rollback_to need retry because of logging(ret=-4036, trans_id_={txid:61850821}, ls_id_={id:1001}, busy_cbs_.get_size()=1)
[2025-03-12 14:32:49.871824] WDIAG [STORAGE.TRANS] ls_sync_rollback_savepoint__ (ob_tx_api.cpp:1354) [153088][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D1E2244-0-0] [lt=51][errcode=-4036] rollback to savepoint sync fail(ret=-4036, part_ctx->get_trans_id()={txid:61850821}, part_ctx->get_ls_id()={id:1001}, retry_cnt=0, op_sn=57, savepoint={branch:0, seq:23519}, expire_ts=-1)
[2025-03-12 14:32:49.871838] WDIAG [STORAGE.TRANS] ls_rollback_to_savepoint_ (ob_tx_api.cpp:1702) [153088][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D1E2244-0-0] [lt=34][errcode=-4036] LS rollback to savepoint fail(ret=-4036, tx_id={txid:61850821}, ls={id:1001}, op_sn=57, savepoint={branch:0, seq:23519}, ctx={this:0x7f7648a2b950, ref:3, trans_id:{txid:61850821}, tenant_id:1012, is_exiting:false, trans_expired_time:1741847569847690, cluster_version:17180067072, trans_need_wait_wrap:{receive_gts_ts_:[mts=0], need_wait_interval_us:0}, stc:[mts=0], ctx_create_time:1741761169847690})
[2025-03-12 14:32:49.871973] WDIAG [STORAGE.TRANS] rollback_to_savepoint (ob_trans_part_ctx.cpp:8726) [153089][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D1E2244-0-0] [lt=38][errcode=-4036] rollback_to need retry because of logging(ret=-4036, trans_id_={txid:61850821}, ls_id_={id:1001}, busy_cbs_.get_size()=1)
[2025-03-12 14:32:49.871993] WDIAG [STORAGE.TRANS] ls_sync_rollback_savepoint__ (ob_tx_api.cpp:1354) [153089][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D1E2244-0-0] [lt=47][errcode=-4036] rollback to savepoint sync fail(ret=-4036, part_ctx->get_trans_id()={txid:61850821}, part_ctx->get_ls_id()={id:1001}, retry_cnt=0, op_sn=57, savepoint={branch:0, seq:23519}, expire_ts=-1)
[2025-03-12 14:32:49.872007] WDIAG [STORAGE.TRANS] ls_rollback_to_savepoint_ (ob_tx_api.cpp:1702) [153089][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D1E2244-0-0] [lt=33][errcode=-4036] LS rollback to savepoint fail(ret=-4036, tx_id={txid:61850821}, ls={id:1001}, op_sn=57, savepoint={branch:0, seq:23519}, ctx={this:0x7f7648a2b950, ref:3, trans_id:{txid:61850821}, tenant_id:1012, is_exiting:false, trans_expired_time:1741847569847690, cluster_version:17180067072, trans_need_wait_wrap:{receive_gts_ts_:[mts=0], need_wait_interval_us:0}, stc:[mts=0], ctx_create_time:1741761169847690})
[2025-03-12 14:32:49.880796] WDIAG [STORAGE.TRANS] sync_rollback_savepoint__ (ob_tx_api.cpp:1878) [153090][T1012_L0_G0][T1012][YB420A0A11CC-00062E869D3DFE50-0-0] [lt=114][errcode=-4012] tx rpc condition wakeup(ret=-4012, tx.tx_id_={txid:61850824}, waittime=10000, rpc_ret=0, expire_ts=1741761467792870, remain=[{ls_id:{id:1001}, exec_epoch:556963917339846, transfer_epoch:-1}], remain_cnt=1, retries=0, tx.state=4)

麻烦先使用obdiag分析一下日志和获取一下当前集群信息
在线分析最近一小时的日志,诊断出出现过的错误
obdiag analyze log --since 1h
obdiag gather scene run --scene=observer.base

1 个赞

查询下看看是否有异常事务未提交的
SELECT * FROM oceanbase.gv$ob_trans_stat WHERE state != ‘COMMITTED’;
检查租户的关于过期全局变量global都是多少
ob_query_timeout
ob_trx_timeout

1 个赞

oceanbase.gv$ob_trans_stat 这个表不存在,sys租户,和业务租户都试了。

MySQL [oceanbase]> SELECT * FROM oceanbase.gv$ob_trans_stat WHERE state != 'COMMITTED';
ERROR 1146 (42S02): Table 'oceanbase.gv$ob_trans_stat' doesn't exist

ob_query_timeout 298000000 #5分钟
ob_trx_timeout 86400000000 #一天

1 个赞

使用这个表,__all_virtual_trans_stat;
麻烦把obdiag分析附件发一份并再发一份完整的observer日志

1 个赞

提示文件太多了
[root@OB01 ~]# obdiag analyze log --since 1h
analyze_log start …
analyze log from_time: 2025-03-13 12:25:54, to_time: 2025-03-13 13:26:54
analyze nodes’s log start. Please wait a moment…
analyze start ok
[WARN] 10.10.17.204 The number of log files is 51, out of range (0,50]

1 个赞

麻烦登陆这个租户,将如下SQL同时粘贴到执行窗口,回车,发下结果

select now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;
select sleep(5),now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;
select sleep(5),now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;
1 个赞
MySQL [(none)]> select now();
ue from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;
select sleep(5),now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000+---------------------+
| now()               |
+---------------------+
| 2025-03-13 13:51:02 |
+---------------------+
1 row in set (0.00 sec)

MySQL [(none)]> select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;
 or con_id = 1) and class < 1000;+-----------+---------+----------+
| tenant_id | stat_id | value    |
+-----------+---------+----------+
|      1012 |   30005 | 89546636 |
+-----------+---------+----------+
1 row in set (0.02 sec)

MySQL [(none)]> select sleep(5),now();
+----------+---------------------+
| sleep(5) | now()               |
+----------+---------------------+
|        0 | 2025-03-13 13:51:02 |
+----------+---------------------+
1 row in set (5.00 sec)

MySQL [(none)]> select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;
+-----------+---------+----------+
| tenant_id | stat_id | value    |
+-----------+---------+----------+
|      1012 |   30005 | 89550030 |
+-----------+---------+----------+
1 row in set (0.02 sec)

MySQL [(none)]> select sleep(5),now();
+----------+---------------------+
| sleep(5) | now()               |
+----------+---------------------+
|        0 | 2025-03-13 13:51:07 |
+----------+---------------------+
1 row in set (5.00 sec)

MySQL [(none)]> select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;
+-----------+---------+----------+
| tenant_id | stat_id | value    |
+-----------+---------+----------+
|      1012 |   30005 | 89561423 |
+-----------+---------+----------+
1 row in set (0.02 sec)
1 个赞

不好意思,麻烦这样查下,全部粘贴进去,回车

select now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;
select sleep(5);
select now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;
select sleep(5);
select now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;

select now();
1 个赞
MySQL [oceanbase]> select now();
_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;
select sleep(5);
select now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) an+---------------------+
| now()               |
+---------------------+
| 2025-03-13 14:12:59 |
+---------------------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;
d (con_id > 1000 or con_id = 1) and class < 1000;

select now();+-----------+---------+----------+
| tenant_id | stat_id | value    |
+-----------+---------+----------+
|      1012 |   30005 | 90122106 |
+-----------+---------+----------+
1 row in set (0.02 sec)

MySQL [oceanbase]> select sleep(5);
+----------+
| sleep(5) |
+----------+
|        0 |
+----------+
1 row in set (5.00 sec)

MySQL [oceanbase]> select now();
+---------------------+
| now()               |
+---------------------+
| 2025-03-13 14:13:04 |
+---------------------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;
+-----------+---------+----------+
| tenant_id | stat_id | value    |
+-----------+---------+----------+
|      1012 |   30005 | 90125546 |
+-----------+---------+----------+
1 row in set (0.01 sec)

MySQL [oceanbase]> select sleep(5);
+----------+
| sleep(5) |
+----------+
|        0 |
+----------+
1 row in set (5.00 sec)

MySQL [oceanbase]> select now();
+---------------------+
| now()               |
+---------------------+
| 2025-03-13 14:13:09 |
+---------------------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30005) and (con_id > 1000 or con_id = 1) and class < 1000;
+-----------+---------+----------+
| tenant_id | stat_id | value    |
+-----------+---------+----------+
|      1012 |   30005 | 90128262 |
+-----------+---------+----------+
1 row in set (0.02 sec)

MySQL [oceanbase]> 
MySQL [oceanbase]> select now();
+---------------------+
| now()               |
+---------------------+
| 2025-03-13 14:13:11 |
+---------------------+
1 row in set (0.00 sec)

MySQL [oceanbase]> 

1 个赞

补充材料:
1.obdiag analyze log --since 1h 这个目前提示文件过多,没有结果
2. obdiag gather scene run --scene=observer.base 这个请查看附件。
3. SELECT * FROM oceanbase.gv$ob_trans_stat WHERE state != ‘COMMITTED’ ; 这个有时是空的,最多的时候结果如下
sql_result.rar (35.2 KB)

MySQL [oceanbase]> SELECT * FROM __all_virtual_trans_stat WHERE state != 'COMMITTED' ;
+-----------+--------------+----------+------------+----------+------------+---------------------+------------+-------+--------------+----------------------------+----------------------------+---------+------------+---------------+-------+-------------------+----------------+------------+------------------+------------------+------+------------+-------------+----------------------------+-------+-------+-----------+----------------------+----------------------+---------------------+-------------------+----------+-----------------+----------------------+------------------------------------------------------------------------------------------------+
| tenant_id | svr_ip       | svr_port | trans_type | trans_id | session_id | scheduler_addr      | is_decided | ls_id | participants | ctx_create_time            | expired_time               | ref_cnt | last_op_sn | pending_write | state | part_trans_action | trans_ctx_addr | mem_ctx_id | pending_log_size | flushed_log_size | role | is_exiting | coordinator | last_request_time          | gtrid | bqual | format_id | start_scn            | end_scn              | rec_scn             | transfer_blocking | busy_cbs | replay_complete | serial_log_final_scn | callback_list_stats                                                                            |
+-----------+--------------+----------+------------+----------+------------+---------------------+------------+-------+--------------+----------------------------+----------------------------+---------+------------+---------------+-------+-------------------+----------------+------------+------------------+------------------+------+------------+-------------+----------------------------+-------+-------+-----------+----------------------+----------------------+---------------------+-------------------+----------+-----------------+----------------------+------------------------------------------------------------------------------------------------+
|      1012 | 10.10.17.204 |     2882 |          0 | 90856067 | 3221684929 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:58.554653 | 2025-03-14 14:10:58.554653 |       3 |          6 |             0 |    10 |                 2 | 0x7f764686fcd0 |         -1 |              245 |                0 |    0 |          0 |          -1 | 2025-03-13 14:10:58.556763 | NULL  | NULL  |        -1 | 18446744073709551615 | 18446744073709551615 | 4611686018427387903 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,5,0,1,0,0]]                       |
|      1012 | 10.10.17.204 |     2882 |          0 | 90846150 | 3221551189 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:23.807890 | 2025-03-14 14:10:23.807890 |       2 |          3 |             0 |    10 |                 2 | 0x7f7646325950 |         -1 |                0 |               45 |    0 |          0 |          -1 | 2025-03-13 14:10:23.807890 | NULL  | NULL  |        -1 |  1741846252416052002 | 18446744073709551615 | 1741846252416052002 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,2,2,0,0,1741846252416052002]]     |
|      1012 | 10.10.17.204 |     2882 |          0 | 90856058 | 3221743548 | "10.10.17.204:2882" |          0 |  1001 | [{id:-1}]    | 2025-03-13 14:10:58.532506 | 2025-03-14 14:10:58.532506 |       4 |         57 |             1 |    10 |                 2 | 0x7f76463080d0 |         -1 |            18104 |            16428 |    0 |          0 |          -1 | 2025-03-13 14:10:58.556763 | NULL  | NULL  |        -1 |  1741846258548345001 | 18446744073709551615 | 1741846258548345001 |                 0 |       -1 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,356,237,0,0,1741846258548345001]] |
|      1012 | 10.10.17.204 |     2882 |          0 | 90856068 | 3221549937 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:58.554653 | 2025-03-14 14:10:58.554653 |       2 |          5 |             0 |    10 |                 2 | 0x7f764880f950 |         -1 |               39 |                0 |    0 |          0 |          -1 | 2025-03-13 14:10:58.556763 | NULL  | NULL  |        -1 | 18446744073709551615 | 18446744073709551615 | 4611686018427387903 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,2,0,3,0,0]]                       |
|      1012 | 10.10.17.204 |     2882 |          0 | 90856064 | 3221552112 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:58.549370 | 2025-03-14 14:10:58.549370 |       2 |          4 |             0 |    10 |                 2 | 0x7f76462f72d0 |         -1 |             2789 |                0 |    0 |          0 |          -1 | 2025-03-13 14:10:58.550428 | NULL  | NULL  |        -1 | 18446744073709551615 | 18446744073709551615 | 4611686018427387903 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,53,0,0,0,0]]                      |
|      1012 | 10.10.17.204 |     2882 |          0 | 90856066 | 3221617648 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:58.552542 | 2025-03-14 14:10:58.552542 |       3 |          8 |             1 |    10 |                 2 | 0x7f76460df650 |         -1 |             2884 |                0 |    0 |          0 |          -1 | 2025-03-13 14:10:58.556763 | NULL  | NULL  |        -1 | 18446744073709551615 | 18446744073709551615 | 4611686018427387903 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,47,0,1,0,0]]                      |
|      1012 | 10.10.17.204 |     2882 |          0 | 90856038 | 3221490388 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:58.506240 | 2025-03-14 14:10:58.506240 |       2 |         13 |             0 |    10 |                 2 | 0x7f76463107d0 |         -1 |             2789 |                0 |    0 |          0 |          -1 | 2025-03-13 14:10:58.549370 | NULL  | NULL  |        -1 | 18446744073709551615 | 18446744073709551615 | 4611686018427387903 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,53,0,3,0,0]]                      |
|      1012 | 10.10.17.204 |     2882 |          0 | 90856061 | 3221580474 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:58.542008 | 2025-03-14 14:10:58.542008 |       4 |         30 |             1 |    10 |                 2 | 0x7f76487fa7d0 |         -1 |             8321 |                0 |    0 |          0 |          -1 | 2025-03-13 14:10:58.556763 | NULL  | NULL  |        -1 | 18446744073709551615 | 18446744073709551615 | 4611686018427387903 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,126,0,0,0,0]]                     |
|      1012 | 10.10.17.204 |     2882 |          0 | 90856049 | 3221746276 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:58.524088 | 2025-03-14 14:10:58.524088 |       2 |         57 |             0 |    10 |                 2 | 0x7f7648a2b950 |         -1 |             5850 |            34651 |    0 |          0 |          -1 | 2025-03-13 14:10:58.554653 | NULL  | NULL  |        -1 |  1741846258538842005 | 18446744073709551615 | 1741846258538842005 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,456,356,0,0,1741846258554653001]] |
|      1012 | 10.10.17.206 |     2882 |          0 | 90846150 |          0 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:52.417534 | NULL                       |       2 |          0 |             0 |    10 |                 1 | 0x7f37a91d0450 |         -1 |                0 |               45 |    1 |          0 |          -1 | 2025-03-13 14:10:52.417534 | NULL  | NULL  |        -1 |  1741846252416052002 | 18446744073709551615 | 1741846252416052002 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,2,2,0,0,0]]                       |
|      1012 | 10.10.17.206 |     2882 |          0 | 90856058 |          0 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:58.550385 | NULL                       |       2 |          0 |             0 |    10 |                 1 | 0x7f37aa3c6b50 |         -1 |                0 |            16428 |    1 |          0 |          -1 | 2025-03-13 14:10:58.550385 | NULL  | NULL  |        -1 |  1741846258548345001 | 18446744073709551615 | 1741846258548345001 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,237,237,0,0,0]]                   |
|      1012 | 10.10.17.206 |     2882 |          0 | 90856054 |          0 | "10.10.17.204:2882" |          1 |  1001 | [{id:-1}]    | 2025-03-13 14:10:58.544026 | NULL                       |       3 |          0 |             0 |    50 |                 1 | 0x7f37a8d7ef50 |         -1 |                0 |            44528 |    1 |          0 |          -1 | 2025-03-13 14:10:58.544026 | NULL  | NULL  |        -1 |  1741846258538842004 |  1741846258555709000 | 1741846258538842004 |                 0 |       -1 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,463,463,108,0,0]]                 |
|      1012 | 10.10.17.206 |     2882 |          0 | 90856049 |          0 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:58.546147 | NULL                       |       2 |          0 |             0 |    10 |                 1 | 0x7f37a9824450 |         -1 |                0 |            34651 |    1 |          0 |          -1 | 2025-03-13 14:10:58.546147 | NULL  | NULL  |        -1 |  1741846258538842005 | 18446744073709551615 | 1741846258538842005 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,356,356,0,0,0]]                   |
|      1012 | 10.10.17.205 |     2882 |          0 | 90846150 |          0 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:52.417532 | NULL                       |       2 |          0 |             0 |    10 |                 1 | 0x7fab219b5d50 |         -1 |                0 |               45 |    1 |          0 |          -1 | 2025-03-13 14:10:52.417532 | NULL  | NULL  |        -1 |  1741846252416052002 | 18446744073709551615 | 1741846252416052002 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,2,2,0,0,0]]                       |
|      1012 | 10.10.17.205 |     2882 |          0 | 90856058 |          0 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:58.550374 | NULL                       |       2 |          0 |             0 |    10 |                 1 | 0x7fab21e02ed0 |         -1 |                0 |            16428 |    1 |          0 |          -1 | 2025-03-13 14:10:58.550374 | NULL  | NULL  |        -1 |  1741846258548345001 | 18446744073709551615 | 1741846258548345001 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,237,237,0,0,0]]                   |
|      1012 | 10.10.17.205 |     2882 |          0 | 90856054 |          0 | "10.10.17.204:2882" |          1 |  1001 | [{id:-1}]    | 2025-03-13 14:10:58.544042 | NULL                       |       3 |          0 |             0 |    50 |                 1 | 0x7fab1fe6fcd0 |         -1 |                0 |            44528 |    1 |          0 |          -1 | 2025-03-13 14:10:58.544042 | NULL  | NULL  |        -1 |  1741846258538842004 |  1741846258555709000 | 1741846258538842004 |                 0 |       -1 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,463,463,0,0,0]]                   |
|      1012 | 10.10.17.205 |     2882 |          0 | 90856049 |          0 | "10.10.17.204:2882" |          0 |  1001 | NULL         | 2025-03-13 14:10:58.545099 | NULL                       |       2 |          0 |             0 |    10 |                 1 | 0x7fab21ac80d0 |         -1 |                0 |            34651 |    1 |          0 |          -1 | 2025-03-13 14:10:58.545099 | NULL  | NULL  |        -1 |  1741846258538842005 | 18446744073709551615 | 1741846258538842005 |                 0 |        0 |               1 |                   -1 | ["id, length, logged, removed, branch_removed, sync_scn", [0,356,356,0,0,0]]                   |
+-----------+--------------+----------+------------+----------+------------+---------------------+------------+-------+--------------+----------------------------+----------------------------+---------+------------+---------------+-------+-------------------+----------------+------------+------------------+------------------+------+------------+-------------+----------------------------+-------+-------+-----------+----------------------+----------------------+---------------------+-------------------+----------+-----------------+----------------------+------------------------------------------------------------------------------------------------+
17 rows in set (0.01 sec)
1 个赞

这个算出来两个点的tps分别是688和543,如果监控显示个位数可能有问题,我联系OCP这块的老师看下

是的监控上的TPS,一直没有超过5

1 个赞

麻烦登陆这个租户,再这样查下,将如下SQL同时粘贴到执行窗口,回车,发下结果

select now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
select sleep(5);
select now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
select sleep(5);
select now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
select now();
1 个赞

30009 事物回滚这个算出来,倒是和监控对的上的

MySQL [oceanbase]> select now();
ant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
select sleep(5);
select now();
select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat whe+---------------------+
| now()               |
+---------------------+
| 2025-03-13 15:06:06 |
+---------------------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
re stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
select now();+-----------+---------+---------+
| tenant_id | stat_id | value   |
+-----------+---------+---------+
|      1012 |   30007 | 1478789 |
|      1012 |   30009 |  282236 |
|      1012 |   30011 |       0 |
+-----------+---------+---------+
3 rows in set (0.02 sec)

MySQL [oceanbase]> select sleep(5);
+----------+
| sleep(5) |
+----------+
|        0 |
+----------+
1 row in set (5.00 sec)

MySQL [oceanbase]> select now();
+---------------------+
| now()               |
+---------------------+
| 2025-03-13 15:06:11 |
+---------------------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
+-----------+---------+---------+
| tenant_id | stat_id | value   |
+-----------+---------+---------+
|      1012 |   30007 | 1478804 |
|      1012 |   30009 |  282260 |
|      1012 |   30011 |       0 |
+-----------+---------+---------+
3 rows in set (0.02 sec)

MySQL [oceanbase]> select sleep(5);
+----------+
| sleep(5) |
+----------+
|        0 |
+----------+
1 row in set (5.00 sec)

MySQL [oceanbase]> select now();
+---------------------+
| now()               |
+---------------------+
| 2025-03-13 15:06:16 |
+---------------------+
1 row in set (0.00 sec)

MySQL [oceanbase]> select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from oceanbase.v$sysstat where stat_id IN (30007,30009,30011) and (con_id > 1000 or con_id = 1) and class < 1000;
+-----------+---------+---------+
| tenant_id | stat_id | value   |
+-----------+---------+---------+
|      1012 |   30007 | 1478814 |
|      1012 |   30009 |  282280 |
|      1012 |   30011 |       0 |
+-----------+---------+---------+
3 rows in set (0.02 sec)

MySQL [oceanbase]> select now();
+---------------------+
| now()               |
+---------------------+
| 2025-03-13 15:06:18 |
+---------------------+
1 row in set (0.00 sec)

MySQL [oceanbase]> 

1 个赞

是的,对得上的,OCP这里TPS是按照30007,30009,30011计算的,没有问题的

1 个赞

QPS都到2万了,那这个TPS 肯定不止这么点的,还是有问题呀,一样的业务,之前我们压测的时候QPS 2000+ TPS 一直稳定在400左右
详见这个贴子:
OCP监控中发现租户里 事物回滚 比提交多几倍 - 社区问答- OceanBase社区-分布式数据库

在ocp SQL诊断中 BEGIN 和 COMMIT 的执行也有每秒300以上的量

另外,如果这个TPS 是对的,那又是一个回滚事物比提交事物多的现象,需要如何进一步排查?

1 个赞

更新
obdiag analyze log --since 1h
结果正常:


Analyze OceanBase Online Log Summary:

+--------------+----------+------------+--------------------+-------------+-----------+---------+
| Node         | Status   | FileName   | First Found Time   | ErrorCode   | Message   | Count   |
+==============+==========+============+====================+=============+===========+=========+
| 10.10.17.204 | PASS     |            |                    |             |           |         |
+--------------+----------+------------+--------------------+-------------+-----------+---------+
| 10.10.17.205 | PASS     |            |                    |             |           |         |
+--------------+----------+------------+--------------------+-------------+-----------+---------+
| 10.10.17.206 | PASS     |            |                    |             |           |         |
+--------------+----------+------------+--------------------+-------------+-----------+---------+
For more details, please run cmd ' cat /root/obdiag_analyze_pack_20250313152820/result_details.txt '

Trace ID: bc8693ba-ffdc-11ef-9201-f8f21e597991
If you want to view detailed obdiag logs, please run: obdiag display-trace bc8693ba-ffdc-11ef-9201-f8f21e597991
[root@OB01 ~]# cat /root/obdiag_analyze_pack_20250313152820/result_details.txt 

Analyze OceanBase Online Log Summary:
+--------------+----------+------------+--------------------+-------------+-----------+---------+
| Node         | Status   | FileName   | First Found Time   | ErrorCode   | Message   | Count   |
+==============+==========+============+====================+=============+===========+=========+
| 10.10.17.204 | PASS     |            |                    |             |           |         |
+--------------+----------+------------+--------------------+-------------+-----------+---------+
| 10.10.17.205 | PASS     |            |                    |             |           |         |
+--------------+----------+------------+--------------------+-------------+-----------+---------+
| 10.10.17.206 | PASS     |            |                    |             |           |         |
+--------------+----------+------------+--------------------+-------------+-----------+---------+

Details:

Node: 10.10.17.204
Status: PASS
FileName: None
First Found Time: None
ErrorCode: None
Message: None
Count: None
Last Found Time: None
Cause: None
Solution: None
Trace_IDS: None

Node: 10.10.17.205
Status: PASS
FileName: None
First Found Time: None
ErrorCode: None
Message: None
Count: None
Last Found Time: None
Cause: None
Solution: None
Trace_IDS: None

Node: 10.10.17.206
Status: PASS
FileName: None
First Found Time: None
ErrorCode: None
Message: None
Count: None
Last Found Time: None
Cause: None
Solution: None
Trace_IDS: None

[root@OB01 ~]# 

另外下面是 obdiag check run 巡检结果:

[root@OB01 ~]# obdiag check run
check start ...
[WARN] step_base ResultFalseException:mod max memory over 10G,Please check on oceanbase.__all_virtual_memory_info to find some large mod 
[WARN] step_base ResultFalseException:mod max memory over 10G,Please check on oceanbase.__all_virtual_memory_info to find some large mod 
[WARN] TaskBase execute StepResultFalseException: mod max memory over 10G,Please check on oceanbase.__all_virtual_memory_info to find some large mod   .
[WARN] TaskBase execute StepResultFalseException: mod max memory over 10G,Please check on oceanbase.__all_virtual_memory_info to find some large mod   .
[WARN] step_base ResultFalseException:mod max memory over 10G,Please check on oceanbase.__all_virtual_memory_info to find some large mod 
[WARN] TaskBase execute StepResultFalseException: mod max memory over 10G,Please check on oceanbase.__all_virtual_memory_info to find some large mod   .
[WARN] step_base ResultFalseException:number of sql_error_4012 is 154 
[WARN] step_base ResultFalseException:number of sql_error_4012 is 154 
[WARN] TaskBase execute StepResultFalseException: number of sql_error_4012 is 154   .
[WARN] TaskBase execute StepResultFalseException: number of sql_error_4012 is 154   .
[WARN] step_base ResultFalseException:number of sql_error_4012 is 154 
[WARN] TaskBase execute StepResultFalseException: number of sql_error_4012 is 154   .
[WARN] step_base ResultFalseException:tsar is not installed. we can not check tcp retransmission. 
[WARN] TaskBase execute StepResultFailException: tsar is not installed. we can not check tcp retransmission.  
[WARN] step_base ResultFalseException:tsar is not installed. we can not check tcp retransmission. 
[WARN] step_base ResultFalseException:tsar is not installed. we can not check tcp retransmission. 
[WARN] TaskBase execute StepResultFailException: tsar is not installed. we can not check tcp retransmission.  
[WARN] TaskBase execute StepResultFailException: tsar is not installed. we can not check tcp retransmission.  
[WARN] network_speed is  and the type is <class 'str'>, not int or float or decimal ! set it to 0.
[WARN] step_base ResultFalseException:network_speed is  , less than  
[WARN] network_speed is  and the type is <class 'str'>, not int or float or decimal ! set it to 0.
[WARN] step_base ResultFalseException:network_speed is  , less than  
[WARN] TaskBase execute StepResultFailException: network_speed is  , less than   
[WARN] TaskBase execute StepResultFailException: network_speed is  , less than   
[WARN] step_base ResultFalseException:net.ipv4.tcp_tw_recycle : 0. recommended: 1. 
[WARN] TaskBase execute StepResultFalseException: net.ipv4.tcp_tw_recycle : 0. recommended: 1.   .
[WARN] step_base ResultFalseException:net.ipv4.tcp_tw_recycle : 0. recommended: 1. 
[WARN] step_base ResultFalseException:net.ipv4.tcp_tw_recycle : 0. recommended: 1. 
[WARN] TaskBase execute StepResultFalseException: net.ipv4.tcp_tw_recycle : 0. recommended: 1.   .
[WARN] TaskBase execute StepResultFalseException: net.ipv4.tcp_tw_recycle : 0. recommended: 1.   .
Check observer finished. For more details, please run cmd' cat ./check_report/obdiag_check_report_observer_2025-03-13-15-48-53.table '
Trace ID: 9b29f77c-ffdf-11ef-b508-f8f21e597991
If you want to view detailed obdiag logs, please run: obdiag display-trace 9b29f77c-ffdf-11ef-b508-f8f21e597991

上面的 TaskBase execute StepResultFalseException: number of sql_error_4012 is 154 这个需要关注吗?

1 个赞

在ocp SQL诊断中 BEGIN 和 COMMIT 的执行也有每秒300以上的量 --这个麻烦截图看下,

那又是一个回滚事物比提交事物多的现象 --这个算出来 回滚事物是比提交事物略高,这个问题我再看下

1 个赞