OB数据库常见问题分析方法

Gaoyc · 2023 年7 月 12 日 11:28

sql hung分析：
一、线程追踪
pid: 进程ID。
lwp: 线程ID。在用户态的命令(比如ps)中常用的显示方式。
tid: 线程ID，等于lwp。tid在系统提供的接口函数中更常用，比如syscall(SYS_gettid)和syscall(__NR_gettid)。
tgid: 线程组ID，也就是线程组leader的进程ID，等于pid。
a、通用版本查看pid
show processlist;
MySQL [oceanbase]> show processlist;
±------±-------------±-----±-------------------±----------±------------±------------------±------------------±-------±-------+
| Id | Tenant | User | Host | db | trans_count | svr_session_count | state | tid | pid |
±------±-------------±-----±-------------------±----------±------------±------------------±------------------±-------±-------+
| 52281 | sysbench_poc | root | 10.51.175.65:46920 | oceanbase | 0 | 2 | MCS_ACTIVE_READER | 189635 | 189635 |
±------±-------------±-----±-------------------±----------±------------±------------------±------------------±-------±-------+

b、OB 4版本可以通过 v$ob_processlist或者通过gv$ob_processlist查看
c、OB 2和3.1的版本可以通过 __all_virtual_processlist查看

pstack $pid
match LWP thread id看栈

二、日志限流
ob日志占用的磁盘 IO 带宽上限30MB(3.2.3, 4)，超过IO限制会被丢弃
root登录sys tenant:
alter system set enable_async_syslog = false;
alter system set syslog_io_bandwidth_limit = ‘300M’;

开启日志会话级追踪，业务租户：
set @@b_enable_trace_log = 1;
set @@ob_enable_plan_cache=0;
set @@ob_log_level = debug;

开启日志语句级追踪
SELECT /+log_level('sql.:debug, common.:info’)/ * FROM t;

执行SQL语句
查看trace id内容，在observer日志中查看
select last_trace_id();

定位具体的实现方式：
grep ‘trace id’ observer.log* | grep ‘generate_normal_tsc’

三、排查统计信息收集情况
查看未收集的表
dba_tab_statistics[ob 3、4版本视图]
mysql : select table_name from oceanbase.dba_tab_statistics where owner=‘test’ and last_analyzed is null group by table_name;
oracle: select table_name from user_tab_statistics where last_analyzed is null group by table_name;

按库级别收集,在mysql下库名，在oracle下schema名
call dbms_stats.gather_schema_stats(‘TEST’, degree=>128);

登录用户租户按表名收集, mysql oracle都可以
call dbms_stats.gather_Table_Stats(null,‘t1’);

四：sql plan cache

查看缓存状态
4版本：v$ob_plan_cache_plan_stat
select * from oceanbase.gv$ob_plan_cache_plan_stat where plan_hash = “12843576483891728377”\G

select * from oceanbase.gv$ob_plan_cache_plan_stat where query_sql like “12843576483891728377”\G

select query_sql, plan_hash from oceanbase.gv$ob_sql_audit where query_sql like “%c1=7 and c1=7%”\G

3版本：v$sql_audit
sql_audit

五、高频SQL获取并排查执行计划
先获取sql_id（去掉常数值并将sql文本格式化）分组取最高频sql，然后再查找单个sql是否执行步骤有问题(需要通过 having avg(ELAPSED_TIME)>5000 和 EXECUTE_TIME>1000 排除掉一些耗时极短的非业务sql )
mysql> select /+parallel(16)/ sql_id,count(1) from gv$sql_audit where EXECUTE_TIME>1000 and TENANT_ID>1000 and RET_CODE=0 and QUERY_SQL not like ‘%create table%’ and QUERY_SQL not like ‘SET%’ and QUERY_SQL not like ‘%ob_all_proxy_stat%’ and QUERY_SQL not like ‘%information_schema.%’ group by sql_id having avg(ELAPSED_TIME)>5000 order by 2 desc limit 10;

±---------±---------±------------------±------------------+
| sql_id | count(1) | avg(ELAPSED_TIME) | avg(EXECUTE_TIME) |
±------±---------±------------------±------------------+
| A79D1202A337C0664DC1F38B1610C4A8 |130540 | 10349.3921 | 9893.8078 |
| FFFCA4D67EA0A788813031B8BBC3B329 | 12478 | 10252.8954 | 10074.7032 |
| 6B6F9C8D900AC6F61E3EF9AD1D2FA64D | 450 | 19736.3400 | 19256.0489 |
| 21E9ECA5F9D118375ADC0A02BB4EBDFF | 322 | 50080.4627 | 49743.8758 |
| D230FC9FBDEABD0F9BA3141DBD463055 | 164 | 15617.5183 | 11789.7927 |
| E680411FB2DE5C177CB50F6B5E766332 | 153 |252671.0458 | 252421.7255 |
| DAD9719C0E9697EF469709B3BE37FD4A | 139 | 12959.0072 | 12493.9065 |
| BE2F442D5C93DCCC0F8E65E3B76D418D | 112 | 27521.2857 | 12828.3393 |
| DF213025FB98214796682C1F2EC93EF3 | 102 | 6318.7451 | 5974.2451 |
| 7D96BA883303DB31D48314CAF3A1989A | 101 | 28462.5050 | 14582.2277 |
±---------------------------------±---------±------------------±---------

2.查看具体某个sql的资源消耗情况

mysql> select /+parallel(16)/ trace_id,TENANT_Id,TENANT_NAME,USER_NAME,DB_NAME,db_id,QUERY_SQL,SQL_ID,AFFECTED_ROWS,RETURN_ROWS,PARTITION_CNT,RET_CODE,ELAPSED_TIME,REQUEST_TIME,EXECUTE_TIME,MEMSTORE_READ_ROW_COUNT,SSSTORE_READ_ROW_COUNT,DISK_READS from gv$sql_audit where sql_id in (‘A79D1202A337C0664DC1F38B1610C4A8’) limit 1 \G;
*************************** 1. row ***************************
trace_id: YB426451A5B7-00055CC6BD74C1C3
TENANT_Id: 1149
TENANT_NAME: morderprod_861
USER_NAME: root
DB_NAME: obapayfund
db_id: 1263338860315675
QUERY_SQL: select /* OceanBase */ zone_uid, trans_id, gmt_create, gmt_modified from double_write_done where trans_id=‘ad1d8fb3-c447-4279-82c2-17e1549d55d5’
SQL_ID: A79D1202A337C0664DC1F38B1610C4A8
AFFECTED_ROWS: 0
RETURN_ROWS: 1
PARTITION_CNT: 1
RET_CODE: 0
ELAPSED_TIME: 10568
REQUEST_TIME: 1512449512141136
EXECUTE_TIME: 10411
MEMSTORE_READ_ROW_COUNT: 1
SSSTORE_READ_ROW_COUNT: 17491
DISK_READS: 0

根据SQL消耗时间排序，过滤掉执行时间特别短（小于2ms）和一些ddl语句

mysql> select /+parallel(16)/ trace_id,TENANT_Id,TENANT_NAME,USER_NAME,DB_NAME,db_id,QUERY_SQL,SQL_ID,AFFECTED_ROWS,RETURN_ROWS,PARTITION_CNT,RET_CODE,ELAPSED_TIME,usec_to_time(REQUEST_TIME),EXECUTE_TIME from gv$sql_audit where EXECUTE_TIME>2000 and TENANT_ID>1000 and RET_CODE=0 and QUERY_SQL not like ‘%create table%’ and QUERY_SQL not like ‘SET%’ and QUERY_SQL not like ‘%alter table%’ and QUERY_SQL not like ‘%information_schema%’ order by ELAPSED_TIME desc limit 10 \G;

*************************** 1. row ***************************
trace_id: YB4264518C50-00055D24A26A9039
TENANT_Id: 1156
TENANT_NAME: exrate_dev0_910
USER_NAME: root
DB_NAME: exratecenter
db_id: 1271035441710107
QUERY_SQL: select/*+ QUERY_TIMEOUT(60000000), PARALLEL(5), READ_CLUSTER(SLAVE), READ_CONSISTENCY(WEAK) / * FROM ex_source_rate where gmt_create > ‘2017-12-01’ and rate_source_code != ‘SPDB_AUA_CNY_SPOT_TODAY_WHL’ ORDER BY gmt_create DESC
SQL_ID: DA6171F1098DF55DE80E100A5BE367B2
AFFECTED_ROWS: 0
RETURN_ROWS: 19
PARTITION_CNT: 1
RET_CODE: 0
ELAPSED_TIME: 14279933
usec_to_time(REQUEST_TIME): 2017-12-05 15:41:20.904485
EXECUTE_TIME: 14271506
*************************** 2. row ***************************
trace_id: YB4264518C51-00055CEEC470C6D7
TENANT_Id: 1156
TENANT_NAME: exrate_dev0_910
USER_NAME: root
DB_NAME: exratecenter
db_id: 1271035441710107
QUERY_SQL: select/+ QUERY_TIMEOUT(60000000), PARALLEL(5), READ_CLUSTER(SLAVE), READ_CONSISTENCY(WEAK) */ * FROM ex_source_rate where gmt_create > ‘2017-12-01’ ORDER BY gmt_create DESC
SQL_ID: 776360BD16A75452179B1C13F1BE4192
AFFECTED_ROWS: 0
RETURN_ROWS: 200
PARTITION_CNT: 1
RET_CODE: 0
ELAPSED_TIME: 12648535
usec_to_time(REQUEST_TIME): 2017-12-05 15:40:31.017910
EXECUTE_TIME: 12643104
sql monitor

OceanBase(admin@test)>select/+monitor leading(t2,t1) use_hash(t1)/ * from t1 left join t2 on t2.c2=t1.c1;
Empty set (0.077 sec)

OceanBase(admin@test)>select last_trace_id();
±----------------------------------+
| last_trace_id() |
±----------------------------------+
| Y985645306CE-0005FE86E03FA332-0-0 |
±----------------------------------+
1 row in set (0.007 sec)
//利用这个trace id去替换下文中查询where条件中的trace id
OceanBase(admin@test)>select plan_line_id, plan_operation, sum(output_rows), sum(STARTS) rescan, min(first_refresh_time) open_time, max(last_refresh_time) close_time, max(last_refresh_time) - min(first_refresh_time) open_close_cost, min(first_change_time) first_row_time, max(last_change_time) last_row_eof_time, max(last_change_time) - min(first_change_time) rows_cost, count(1) from oceanbase.GV$SQL_PLAN_MONITOR where trace_id = ‘Y985645306CE-0005FE86E03FA332-0-0’ group by plan_line_id, plan_operation order by plan_line_id;
±-------------±---------------±-----------------±-------±---------------------------±---------------------------±----------------±---------------±---------------------------±----------±---------+
| plan_line_id | plan_operation | sum(output_rows) | rescan | open_time | close_time | open_close_cost | first_row_time | last_row_eof_time | rows_cost | count(1) |
±-------------±---------------±-----------------±-------±---------------------------±---------------------------±----------------±---------------±---------------------------±----------±---------+
| 0 | PHY_HASH_JOIN | 0 | 0 | 2023-06-20 18:52:19.778575 | 2023-06-20 18:52:19.784382 | 0.005807 | NULL | 2023-06-20 18:52:19.783319 | NULL | 1 |
| 1 | PHY_TABLE_SCAN | 0 | 0 | 2023-06-20 18:52:19.778575 | 2023-06-20 18:52:19.784382 | 0.005807 | NULL | 2023-06-20 18:52:19.781135 | NULL | 1 |
| 2 | PHY_TABLE_SCAN | 0 | 0 | 2023-06-20 18:52:19.778575 | 2023-06-20 18:52:19.784382 | 0.005807 | NULL | 2023-06-20 18:52:19.783319 | NULL | 1 |
±-------------±---------------±-----------------±-------±---------------------------±---------------------------±----------------±---------------±---------------------------±----------±---------+
3 rows in set (0.123 sec)

抓包排查协议层问题
//抓包
/usr/sbin/tcpdump -i any host 10.51.175.66 -w obclient.pcap

生成的文件包采用wireshark分析

perf分析性能

生成调用图

perf record -o perf.data -e cycles -c 100000000 -p $(pidof -s observer) -g – sleep 20
perf script -i perf.data -F ip,sym -f > data.viz

生成 perf 火焰图

perf record -o perf.data -F 99 -p $(pidof -s observer) -g – sleep 20

perf record -o perf.data -F 99 -p 87741 -g – sleep 20

perf script -i perf.data > data.viz

生成 pstack 火焰图

pstack 87741 > data.viz
pstack $(pidof -s observer) > data.viz

生成火焰图

git clone GitHub - brendangregg/FlameGraph: Stack trace visualizer # or download it from github

mv perf.data FlameGraph/

cd FlameGraph

perf script -i per.data| ./stackcollapse-perf.pl > out.perf-folded

cat out.perf-folded | ./flamegraph.pl > perf-kernel.svg

绘制perf热力图

git clone GitHub - brendangregg/HeatMap: Heat map generation tools # or download it from github

mv perf.data HeatMap/

cd HeatMap

perf script -i perf.data | awk '{ gsub(/:/, “”) } $5 ~ /issue/ { ts[$6, $10] = $4 }

$5 ~ /complete/ { if (l = ts[$6, $9]) { printf "%.f %.f\n", $4 * 1000000,
($4 - l) * 1000000; ts[$6, $10] = 0 } }' > out.lat_us

./trace2heatmap.pl --unitstime=us --unitslat=us --maxlat=50000 out.lat_us > out.svg

#svg文件可以通过浏览器查看

##无法可视化，直接perf report
perf report -i perf.data -f

未完待续

UNIQUE_KEY · 2023 年7 月 12 日 11:49

找高频SQL不一定有用，要找总 CPU 时间高的SQL（execute_time - total_wait_time_micro）, 这个看白屏TOPSQL就行了。

渔舟唱晚 · 2023 年7 月 12 日 11:50

不敢苟同。

渔舟唱晚 · 2023 年7 月 12 日 11:51

写的很好，学习了

UNIQUE_KEY · 2023 年7 月 12 日 11:58

慢 SQL会导致 CPU 被打爆，这种情况看执行次数，容易看漏。
真实场景遇到过这种现场：
某主机 CPU 被打爆，排查人员按总执行次数查问题，忽略了关键问题。

DBRE · 2023 年7 月 12 日 12:02

学习学习