【 使用环境 】生产环境
【 使用版本 】4.3.5.2
【问题描述】ob 索引表维护消耗大量 io 资源
【复现路径】无法复现
【附件及日志】
大佬们,生产集群在晚上 9 点触发了 7 个 insert … select …,消耗了大量的 io 资源。涉及到的表 __idx_xxx_idx_xxx_index_snapshot_data_table 看起来是内部索引表相关,这是什么操作呢,触发逻辑是什么呢
INSERT
/*+ monitor enable_parallel_dml parallel(1) opt_param(‘ddl_execution_id’, 1) opt_param(‘ddl_task_id’, 93659003) opt_param(‘enable_newsort’, ‘false’) use_px /
INTO tuya_aimemory_prod.__idx_500583_idx_1764334847108061_index_snapshot_data_table PARTITION(p0, p3)(
__key_20_1748264415604471,
id,
__vid_1748264415602220,
__vector_20_1748264415602775,
__data_20_1748264415604809
)
SELECT
/ + index(ai_doc primary) ob_ddl_schema_version(ai_doc, 1764334848743784) */
__key_20_1748264415604471 AS __key_20_1748264415604471,
id AS id,
__vid_1748264415602220 AS __vid_1748264415602220,
__vector_20_1748264415602775 AS __vector_20_1748264415602775,
__data_20_1748264415604809 AS __data_20_1748264415604809
from
tuya_aimemory_prod.ai_doc PARTITION(p0, p3) as of snapshot 1764334848906957000
order by
1,
2;
2 个赞
辞霜
2025 年12 月 1 日 14:42
#3
使用的是向量么
每隔24h刷新一次snapshot_data_table,即把 delta_buffer_table,index_id_table,snapshot_data_table的数据删除,新建,并从主表中,补全新的 snapshot_data_table。(全量刷新)
1 个赞
是用到向量了。这是每个向量表、每天都会有这个操作吗,这个表已经蛮久了,那应该每天都有这操作的,可之前没怎么观察到,有文档介绍吗
1 个赞
辞霜
2025 年12 月 1 日 18:03
#5
每天都会操作。可能是数据库压力导致该sql执行时间较长导致
1 个赞
你好,今天又出现这问题了,跟其它这类 sql 对比了,这条 sql 执行时更新了大量数据,其它 sql 没有,执行时间就很长,资源方面没看到有啥瓶颈,有其它排查方向吗
1 个赞
辞霜
2025 年12 月 15 日 15:43
#7
麻烦发一份sql文本。这条sql看着是等待时间长导致的 可能是在等锁。点进去看一下执行计划执行耗时在哪里了
1 个赞
INSERT
/*+ monitor enable_parallel_dml parallel(1) opt_param(‘ddl_execution_id’, 1) opt_param(‘ddl_task_id’, 59525410) opt_param(‘enable_newsort’, ‘false’) use_px /
INTO tuya_aimemory_prod.__idx_520618_idx_1765767439285897_index_snapshot_data_table (
__key_23_1760929026937846,
id,
gmt_create,
__vid_1760929026935011,
__vector_23_1760929026935543,
__data_23_1760929026938118
)
SELECT
/ + index(rag_knowledge_document_slice primary) ob_ddl_schema_version(rag_knowledge_document_slice, 1765767439926040) */
__key_23_1760929026937846 AS __key_23_1760929026937846,
id AS id,
gmt_create AS gmt_create,
__vid_1760929026935011 AS __vid_1760929026935011,
__vector_23_1760929026935543 AS __vector_23_1760929026935543,
__data_23_1760929026938118 AS __data_23_1760929026938118
from
tuya_aimemory_prod.rag_knowledge_document_slice as of snapshot 1765767440033323000
order by
1,
2,
3;
这是执行计划及 sql
1 个赞
辞霜
2025 年12 月 15 日 16:14
#9
这里看实际执行时间1158ms,几乎都是在等待执行
1 个赞
等待这个状态就是在干等吗,为什么处于等待状态呢,看了下 top sql,好些都是这类的,几乎都在等待,如何排查下呢
1 个赞
辞霜
2025 年12 月 15 日 18:06
#11
select * from gv$ob_sql_audit where sql_id=‘xxxxx’;
sql audit表会记录相关信息麻烦查询发出来
你好,随便拿了两条 sql,麻烦给看下呢
*************************** 38. row ***************************
SVR_IP: 172.26.3.47
SVR_PORT: 2882
REQUEST_ID: 60369004
SQL_EXEC_ID: 574060498
TRACE_ID: Y200B42AC1A0329-00063A75A07DD1B8-0-0
SID: 3221890030
CLIENT_IP: 0.0.0.0
CLIENT_PORT: 0
TENANT_ID: 1002
TENANT_NAME: sys
EFFECTIVE_TENANT_ID: 1002
USER_ID: -1
USER_NAME:
USER_GROUP: 0
USER_CLIENT_IP: 0.0.0.0
DB_ID: 201001
DB_NAME: oceanbase
SQL_ID: 2B0D7AD8FEFB81CE482A7B60656861AB
QUERY_SQL: INSERT /*+ monitor enable_parallel_dml parallel(1) opt_param(‘ddl_execution_id’, 2) opt_param(‘ddl_task_id’, 59522304) opt_param(‘enable_newsort’, ‘false’) use_px /INTO tuya_aimemory_prod.__idx_500612_idx_1765766796854809_index_id_table PARTITION(M202506,M202509,P202512,P202603)(__scn_26_1753843499651391, __vid_1753843499647861, __type_26_1753843499648191, biz_time, __vector_26_1753843499648596) SELECT / + index(ai_memory_alphago_memory_fact primary) ob_ddl_schema_version(ai_memory_alphago_memory_fact, 1765766797204232) / __scn_26_1753843499651391 AS __scn_26_1753843499651391, __vid_1753843499647861 AS __vid_1753843499647861, __type_26_1753843499648191 AS __type_26_1753843499648191, biz_time AS biz_time, __vector_26_1753843499648596 AS __vector_26_1753843499648596 from tuya_aimemory_prod.ai_memory_alphago_memory_fact PARTITION(M202506,M202509,P202512,P202603) as of snapshot 1765766797402245001 order by 1, 2, 3
PLAN_ID: 388575
AFFECTED_ROWS: 0
RETURN_ROWS: 0
PARTITION_CNT: 12
RET_CODE: 0
QC_ID: 0
DFO_ID: 0
SQC_ID: 0
WORKER_ID: 0
EVENT: exec inner sql wait
P1TEXT: wait inner sql class
P1: 0
P2TEXT: inner session id
P2: 4611686018791066348
P3TEXT:
P3: 0
LEVEL: 0
WAIT_CLASS_ID: 100
WAIT_CLASS#: 0
WAIT_CLASS: OTHER
STATE: WAITED KNOWN TIME
WAIT_TIME_MICRO: 10630
TOTAL_WAIT_TIME_MICRO: 639754
TOTAL_WAITS: 612
RPC_COUNT: 6
PLAN_TYPE: 3
IS_INNER_SQL: 1
IS_EXECUTOR_RPC: 0
IS_HIT_PLAN: 0
REQUEST_TIME: 1765766797507055
ELAPSED_TIME: 667018
NET_TIME: 0
NET_WAIT_TIME: 0
QUEUE_TIME: 0
DECODE_TIME: 0
GET_PLAN_TIME: 114353
EXECUTE_TIME: 552665
APPLICATION_WAIT_TIME: 0
CONCURRENCY_WAIT_TIME: 0
USER_IO_WAIT_TIME: 0
SCHEDULE_TIME: 0
ROW_CACHE_HIT: 0
BLOOM_FILTER_CACHE_HIT: 0
BLOCK_CACHE_HIT: 0
DISK_READS: 0
RETRY_CNT: 0
TABLE_SCAN: 1
CONSISTENCY_LEVEL: 3
MEMSTORE_READ_ROW_COUNT: 0
SSSTORE_READ_ROW_COUNT: 0
DATA_BLOCK_READ_CNT: 0
DATA_BLOCK_CACHE_HIT: 0
INDEX_BLOCK_READ_CNT: 0
INDEX_BLOCK_CACHE_HIT: 0
BLOCKSCAN_BLOCK_CNT: 0
BLOCKSCAN_ROW_CNT: 0
PUSHDOWN_STORAGE_FILTER_ROW_CNT: 0
REQUEST_MEMORY_USED: 8612263
EXPECTED_WORKER_COUNT: 0
USED_WORKER_COUNT: 0
SCHED_INFO: NULL
FUSE_ROW_CACHE_HIT: 0
PS_CLIENT_STMT_ID: -1
PS_INNER_STMT_ID: -1
TX_ID: 59260447
SNAPSHOT_VERSION: 1765766797627940002
REQUEST_TYPE: 1
IS_BATCHED_MULTI_STMT: 0
OB_TRACE_INFO: NULL
PLAN_HASH: 12952747494869742316
LOCK_FOR_READ_TIME: 0
PARAMS_VALUE:
RULE_NAME:
PARTITION_HIT: 1
TX_INTERNAL_ROUTING: 0
TX_STATE_VERSION: 0
FLT_TRACE_ID:
PL_TRACE_ID: NULL
PLSQL_EXEC_TIME: 0
FORMAT_SQL_ID:
STMT_TYPE: INSERT
TOTAL_MEMSTORE_READ_ROW_COUNT: 0
TOTAL_SSSTORE_READ_ROW_COUNT: 0
PROXY_USER:
SEQ_NUM: 1765766797628111
NETWORK_WAIT_TIME: 0
PLSQL_COMPILE_TIME: 0
INSERT_DUPLICATE_ROW_COUNT: 0
FROM_UNIXTIME(left(REQUEST_TIME,10)): 2025-12-15 10:46:37.000000
*************************** 39. row ***************************
SVR_IP: 172.26.3.47
SVR_PORT: 2882
REQUEST_ID: 60369008
SQL_EXEC_ID: 574060446
TRACE_ID: Y200B42AC1A0329-00063A75A07DD1B8-0-0
SID: 3221834625
CLIENT_IP: 0.0.0.0
CLIENT_PORT: 0
TENANT_ID: 1002
TENANT_NAME: sys
EFFECTIVE_TENANT_ID: 1002
USER_ID: -1
USER_NAME:
USER_GROUP: 0
USER_CLIENT_IP: 0.0.0.0
DB_ID: 201001
DB_NAME: oceanbase
SQL_ID: 9C666CC6E4E8ED637B55BC39590188C5
QUERY_SQL: INSERT / + monitor enable_parallel_dml parallel(1) opt_param(‘ddl_execution_id’, 2) opt_param(‘ddl_task_id’, 59522294) opt_param(‘enable_newsort’, ‘false’) use_px /INTO tuya_aimemory_prod.__idx_500612_idx_1765766796854809 PARTITION(M202506,M202509,P202512,P202603)(__vid_1753843499647861, __type_26_1753843499648191, biz_time, __vector_26_1753843499648596) SELECT / + index(ai_memory_alphago_memory_fact primary) ob_ddl_schema_version(ai_memory_alphago_memory_fact, 1765766797204232) */ __vid_1753843499647861 AS __vid_1753843499647861, __type_26_1753843499648191 AS __type_26_1753843499648191, biz_time AS biz_time, __vector_26_1753843499648596 AS __vector_26_1753843499648596 from tuya_aimemory_prod.ai_memory_alphago_memory_fact PARTITION(M202506,M202509,P202512,P202603) as of snapshot 1765766797235640000 order by 1, 2
PLAN_ID: 388574
AFFECTED_ROWS: 0
RETURN_ROWS: 0
PARTITION_CNT: 12
RET_CODE: 0
QC_ID: 0
DFO_ID: 0
SQC_ID: 0
WORKER_ID: 0
EVENT: exec inner sql wait
P1TEXT: wait inner sql class
P1: 0
P2TEXT: inner session id
P2: 4611686018791066261
P3TEXT:
P3: 0
LEVEL: 0
WAIT_CLASS_ID: 100
WAIT_CLASS#: 0
WAIT_CLASS: OTHER
STATE: WAITED KNOWN TIME
WAIT_TIME_MICRO: 23462
TOTAL_WAIT_TIME_MICRO: 794219
TOTAL_WAITS: 809
RPC_COUNT: 6
PLAN_TYPE: 3
IS_INNER_SQL: 1
IS_EXECUTOR_RPC: 0
IS_HIT_PLAN: 0
REQUEST_TIME: 1765766797373787
ELAPSED_TIME: 842088
NET_TIME: 0
NET_WAIT_TIME: 0
QUEUE_TIME: 0
DECODE_TIME: 0
GET_PLAN_TIME: 245283
EXECUTE_TIME: 596805
APPLICATION_WAIT_TIME: 0
CONCURRENCY_WAIT_TIME: 0
USER_IO_WAIT_TIME: 0
SCHEDULE_TIME: 0
ROW_CACHE_HIT: 0
BLOOM_FILTER_CACHE_HIT: 0
BLOCK_CACHE_HIT: 0
DISK_READS: 0
RETRY_CNT: 0
TABLE_SCAN: 1
CONSISTENCY_LEVEL: 3
MEMSTORE_READ_ROW_COUNT: 0
SSSTORE_READ_ROW_COUNT: 0
DATA_BLOCK_READ_CNT: 0
DATA_BLOCK_CACHE_HIT: 0
INDEX_BLOCK_READ_CNT: 0
INDEX_BLOCK_CACHE_HIT: 0
BLOCKSCAN_BLOCK_CNT: 0
BLOCKSCAN_ROW_CNT: 0
PUSHDOWN_STORAGE_FILTER_ROW_CNT: 0
REQUEST_MEMORY_USED: 5927701
EXPECTED_WORKER_COUNT: 0
USED_WORKER_COUNT: 0
SCHED_INFO: NULL
FUSE_ROW_CACHE_HIT: 0
PS_CLIENT_STMT_ID: -1
PS_INNER_STMT_ID: -1
TX_ID: 59260446
SNAPSHOT_VERSION: 1765766797627940002
REQUEST_TYPE: 1
IS_BATCHED_MULTI_STMT: 0
OB_TRACE_INFO: NULL
PLAN_HASH: 12952747494869742316
LOCK_FOR_READ_TIME: 0
PARAMS_VALUE:
RULE_NAME:
PARTITION_HIT: 1
TX_INTERNAL_ROUTING: 0
TX_STATE_VERSION: 0
FLT_TRACE_ID: 000645f4-335c-a836-3dcb-fe9aa3db3839
PL_TRACE_ID: NULL
PLSQL_EXEC_TIME: 0
FORMAT_SQL_ID:
STMT_TYPE: INSERT
TOTAL_MEMSTORE_READ_ROW_COUNT: 0
TOTAL_SSSTORE_READ_ROW_COUNT: 0
PROXY_USER:
SEQ_NUM: 1765766797628111
NETWORK_WAIT_TIME: 0
PLSQL_COMPILE_TIME: 0
INSERT_DUPLICATE_ROW_COUNT: 0
FROM_UNIXTIME(left(REQUEST_TIME,10)): 2025-12-15 10:46:37.000000
辞霜
2025 年12 月 16 日 10:05
#13
这条sql是属于向量维护索引操作,dml操作比较多情况下系统会维护索引这个无法进行优化,建议升级一下集群,435bp2有不少bug问题。且新版本对维护索引操作进行过优化。
这个维护操作能指定维护窗口吗,现在随机出现没法控制。升级的话听话到哪个版本时再往上升就只能迁移数据了,低版本的话反而可以直接升级,所以暂时还没升级的想法
1 个赞
淇铭
2025 年12 月 16 日 10:20
#15
全量刷新每 24 小时会检查一次,如果新增数据超过原有数据的 20%,则自动执行全量刷新。全量刷新会在后台异步执行,首先创建新的索引,然后替换旧索引。在重建过程中,旧索引保持可用状态,但整体过程相对较慢。
https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000002012936
你看看这个文档 看看是否可以增量刷新 另外如果向量的索引较少 可以取消本次的自动rebuild 使用手动+并发的方式 但是比较耗资源