最近收到一个奇怪的报错,具体是什么呢?有十几张表需要truncate,在执行到最后一张表时,执行truncate table 时,立即返回 error code -4179,Operation not allowed now. truncate table 还能报错?翻阅官网,发现了相关的报错信息,[errcode=-4179] offline ddl is being executed, other ddl operations are not allowed. 报错原因是在执行offline ddl 时,只能一个一个执行,是串行的,不能并行执行,一个offline ddl 执行时,其他的ddl 只能等待。truncate table 属于 offline ddl,但是并没有并行执行truncate table的操作,大概率不是这个原因,但保留错误验证的情况。官方并没有提供如何查看正在执行 offline ddl 的方法和视图。
处理这个问题,怎么切入排查呢?首先复现一下这个报错,看是否依然存在,通过obclient和odc登录均可,复现问题后,登录primary zone 对应的节点,查看observer日志,过滤关键词对应租户ID和 truncate table,过滤出对应日志后,根据TRACE ID 查看整条日志链路,在日志中并未发现具体相关报错信息,只有error code -4179 execute rpc fail 和 rpc proxy truncate table failed. 在rootservice日志中没有相关日志信息。有报错并没有详细报错,但rpc 指向了rs 主节点地址,在observer 执行 ddl 时,并不是由本地observer 执行的,OB的执行机制是有个专门的服务叫rootservice服务进行统一调度执行,所以通过如下语句确认服务节点。
SELECT svr_ip,with_rootserver FROM oceanbase.DBA_OB_SERVERS;
由于rootservice同时只能有一个节点作为主,根据查到的结果,登录对应主RS节点,根据trace id 查看rootservice日志,看到了问题的具体原因:
[errcode=-4179] index table's index status is not available.
于是通过dba_indexes视图查看对应表下的索引,发现有个global的索引状态为unusable。
解决方法:
备份索引创建语句,删除索引,truncate table 成功,创建索引。
思考延伸:
以上是单一的情况,看看还有没有其他的情况会出现 truncate table ddl 失败。查看ob_ddl_service.cpp源文件。
int ObDDLService::new_truncate_table(const obrpc::ObTruncateTableArg &arg,
obrpc::ObDDLRes &ddl_res,
const SCN &frozen_version)
{
...
if (OB_FAIL(check_inner_stat())) {
LOG_WARN("variable is not init", KR(ret));
} else if (OB_ISNULL(schema_service = schema_service_->get_schema_service())) {
ret = OB_ERR_UNEXPECTED;
LOG_WARN("schema_service must not null", KR(ret));
} else if (OB_INVALID_ID == tenant_id || arg.database_name_.empty()
|| arg.table_name_.empty()) {
ret = OB_INVALID_ARGUMENT;
LOG_WARN("invalid argument", KR(ret), K(arg));
//tenant share lock
} else if (OB_FAIL(trans.start(sql_proxy_, tenant_id, fake_schema_version, with_snapshot))) {
LOG_WARN("failed to start trans", KR(ret), K(tenant_id));
} else if (OB_ISNULL(conn = dynamic_cast<observer::ObInnerSQLConnection *>
(trans.get_connection()))) {
ret = OB_ERR_UNEXPECTED;
LOG_WARN("trans conn is NULL", KR(ret), K(arg));
// To verify the existence of database and table
} else if (OB_FAIL(check_db_and_table_is_exist(arg, trans, database_id, table_id))) {
LOG_WARN("failed to check database and table exist", KR(ret), K(arg.database_name_), K(arg.table_name_));
} else {
// table lock
LOG_INFO("truncate cost after trans start and check_db_table_is_exist", KR(ret), "cost_ts", before_table_lock - start_time);
// try lock
if (OB_FAIL(ObInnerConnectionLockUtil::lock_table(tenant_id,
table_id,
EXCLUSIVE,
0,
conn))) {
LOG_WARN("failed to lock table", KR(ret), K(arg.table_name_), K(table_id));
// for error code convert
if (OB_OP_NOT_ALLOW == ret) {
ret = OB_SUCCESS;
lock_table_not_allow = true;
}
}
uint64_t compat_version = 0;
int64_t after_table_lock = ObTimeUtility::current_time();
LOG_INFO("truncate cost after lock table", KR(ret), "cost_ts", after_table_lock - before_table_lock);
if (FAILEDx(schema_service->get_db_schema_from_inner_table(schema_status, database_id, database_schema_array, trans))){
LOG_WARN("fail to get database schema", KR(ret), K(arg.database_name_), K(database_id));
// get table full scehma
} else if (OB_FAIL(schema_service->get_full_table_schema_from_inner_table(schema_status, table_id, orig_table_schema, allocator, trans))) {
LOG_WARN("fail to get table schema", KR(ret), K(arg.table_name_), K(table_id));
// in upgrade, check the data_version to prevent from executing wrong logical
} else if (OB_FAIL(GET_MIN_DATA_VERSION(tenant_id, compat_version))) {
LOG_WARN("get min data_version failed", KR(ret), K(tenant_id));
} else if (compat_version < DATA_VERSION_4_1_0_0) {
ret = OB_NOT_SUPPORTED;
LOG_WARN("server state is not suppported when tenant's data version is below 4.1.0.0", KR(ret), K(compat_version));
LOG_USER_ERROR(OB_NOT_SUPPORTED, "tenant's data version is below 4.1.0.0, truncate table is ");
} else if (orig_table_schema.get_autoinc_column_id() != 0
&& compat_version < DATA_VERSION_4_2_0_0) {
ret = OB_NOT_SUPPORTED;
LOG_WARN("server state is not suppported to use_parallel_truncate when tenant's data version is below 4.2.0.0 "
"and table has autoinc column", KR(ret), K(compat_version), K(tenant_id), K(arg.table_name_));
LOG_USER_ERROR(OB_NOT_SUPPORTED, "tenant's data version is below 4.2.0.0, truncate table with autoinc column is ");
// To verify the args are legal
} else if (OB_FAIL(check_table_schema_is_legal(database_schema_array.at(0), orig_table_schema, arg.foreign_key_checks_, trans))) {
LOG_WARN("failed to check table schema is legal",
KR(ret), K(arg.table_name_), K(table_id), K(orig_table_schema.get_schema_version()));
} else if (lock_table_not_allow) {
ret = OB_OP_NOT_ALLOW;
LOG_WARN("fail to lock table", KR(ret), K(orig_table_schema));
}// get index and lob schema
else if (OB_FAIL(get_index_lob_table_schema(orig_table_schema, schema_status,
table_schema_array, allocator, trans))) {
LOG_WARN("fail to get index or lob table schema",
KR(ret), K(arg.table_name_), K(table_id), K(orig_table_schema.get_schema_version()));
}
int64_t after_get_schema = ObTimeUtility::current_time();
LOG_INFO("truncate cost after get schema and check legal",
KR(ret), "cost_ts", after_get_schema - after_table_lock);
if (FAILEDx(new_truncate_table_in_trans(table_schema_array, trans, &arg.ddl_stmt_str_, ddl_res))) {
LOG_WARN("truncate table in trans failed",
KR(ret), K(arg.table_name_), K(table_id), K(orig_table_schema.get_schema_version()));
}
int64_t finish_truncate_table = ObTimeUtility::current_time();
LOG_INFO("truncate cost after finish truncate", KR(ret), K(tenant_id), K(table_id), "cost_ts", finish_truncate_table - start_time);
}
return ret;
}
- check_inner_stat() 检查ObDDLService类内部成员是否都正常初始化
- OB_ISNULL(schema_service = schema_service_->get_schema_service()) 检查空指针,schema_service must not null
- OB_INVALID_ID == tenant_id || arg.database_name_.empty() || arg.table_name_.empty() 检查租户ID 数据库名称 表名 参数是否为空
- trans.start(sql_proxy_, tenant_id, fake_schema_version, with_snapshot) 开启事务,tenant share lock
- conn = dynamic_cast<observer::ObInnerSQLConnection *>(trans.get_connection() 获取数据库底层连接ObInnerSQLConnection
- check_db_and_table_is_exist(arg, trans, database_id, table_id) 检查事务,数据库,表是否都存在。
- ObInnerConnectionLockUtil::lock_table(tenant_id, table_id,EXCLUSIVE,0,conn) 锁表加EXCLUSIVE 排他锁。
- schema_service->get_db_schema_from_inner_table 从内部表获取 db schema 信息
- schema_service->get_full_table_schema_from_inner_table 从内部表获取完整表schema 信息
- GET_MIN_DATA_VERSION(tenant_id, compat_version) 获取最小数据版本
- compat_version < DATA_VERSION_4_1_0_0 检查兼容版本即最小数据版本是否小于4.1.0.0,如果小于是不支持的数据版本。
- orig_table_schema.get_autoinc_column_id() != 0 && compat_version < DATA_VERSION_4_2_0_0 检查这个表是否包含自增列,且最小数据版本小于4.2.0.0,tenant’s data version is below 4.2.0.0, truncate table with autoinc column is OB_NOT_SUPPORTED
- check_table_schema_is_legal 验证表schema 信息相关参数是否合法
- get_index_lob_table_schema 获取所有索引和lob相关表schema 信息
- new_truncate_table_in_trans 实际执行truncate 操作
由于报错在index,所以特殊查看一下 get_index_lob_table_schema
int ObDDLService::get_index_lob_table_schema(const ObTableSchema &orig_table_schema,
const ObRefreshSchemaStatus &schema_status,
common::ObArray<const ObTableSchema*> &table_schemas,
ObArenaAllocator &allocator,
ObMySQLTransaction &trans)
{
...
} else if (OB_FAIL(table_schemas.push_back(&orig_table_schema))) {
LOG_WARN("fail to push back orign table schema", KR(ret), K(table_name), K(schema_version));
}// get index table id
else if (OB_FAIL(orig_table_schema.get_simple_index_infos(simple_index_infos))) {
LOG_WARN("get simple_index_infos failed", KR(ret), K(table_name), K(schema_version));
} else {
ObIndexType index_type = INDEX_TYPE_IS_NOT;
ObTableType table_type = MAX_TABLE_TYPE;
// get all index table id
int64_t index_count = simple_index_infos.count();
for (int64_t i = 0; OB_SUCC(ret) && i < index_count; ++i) {
index_type = simple_index_infos.at(i).index_type_;
table_type = simple_index_infos.at(i).table_type_;
if ((USER_INDEX == table_type) && index_has_tablet(index_type)) {
if (OB_FAIL(table_ids.push_back(simple_index_infos.at(i).table_id_))) {
LOG_WARN("failed to push index id to index_ids",
KR(ret), K(i), K(simple_index_infos.at(i).table_id_), K(table_name), K(schema_version));
}
}
}
}
// get lob table id
if (OB_FAIL(ret)) {
} else if (orig_table_schema.has_lob_column()) {
uint64_t mtid = orig_table_schema.get_aux_lob_meta_tid();
uint64_t ptid = orig_table_schema.get_aux_lob_piece_tid();
if (OB_INVALID_ID == mtid || OB_INVALID_ID == ptid) {
ret = OB_ERR_UNEXPECTED;
LOG_WARN("Expect meta tid and piece tid valid",
KR(ret), K(table_name), K(schema_version), K(mtid), K(ptid));
} else if (OB_FAIL(table_ids.push_back(mtid))) {
LOG_WARN("fail to push back lob meta tid", KR(ret),K(table_name), K(schema_version), K(mtid));
} else if (OB_FAIL(table_ids.push_back(ptid))) {
LOG_WARN("fail to push back lob piece tid", KR(ret), K(table_name), K(schema_version), K(ptid));
}
}
if (OB_SUCC(ret) && 0 != table_ids.count()) {
// this batch impl lost foreign key and trigger etc.
if (OB_FAIL(schema_service->get_batch_table_schema(schema_status, schema_version, table_ids,
trans, allocator, tmp_table_schemas))) {
LOG_WARN("failed to get batch table schema", KR(ret), K(table_name), K(schema_version));
} else {
const ObTableSchema *tmp_schema = NULL;
ObIndexStatus index_status = INDEX_STATUS_AVAILABLE;
int64_t tmp_table_schema_count = tmp_table_schemas.count();
for (int64_t i = 0; OB_SUCC(ret) && i < tmp_table_schema_count; ++i) {
tmp_schema = tmp_table_schemas.at(i);
if (OB_ISNULL(tmp_schema)) {
ret = OB_ERR_UNEXPECTED;
LOG_WARN("tmp schema is NULL", KR(ret));
} else {
table_name = tmp_schema->get_table_name();
database_id = tmp_schema->get_database_id();
index_status = tmp_schema->get_index_status();
schema_version = tmp_schema->get_schema_version();
if (orig_database_id != database_id) {
ret = OB_ERR_UNEXPECTED;
LOG_WARN("orign table database_id is not equal to index table database_id",
KR(ret), K(orig_database_id), K(database_id), K(table_name), K(schema_version));
} else if (tmp_schema->is_index_table() && !is_available_index_status(index_status)) {
ret = OB_OP_NOT_ALLOW;
LOG_WARN("index table's index status is not available",
KR(ret), K(table_name), K(database_id), K(schema_version));
} else if (OB_FAIL(table_schemas.push_back(tmp_schema))) {
LOG_WARN("push back schema failed", KR(ret), K(table_name), K(database_id), K(schema_version));
}
}
}
}
}
return ret;
}
- table_schemas.push_back(&orig_table_schema) 获取主表schema
- orig_table_schema.get_simple_index_infos(simple_index_infos)获取简要索引信息
- int64_t index_count = simple_index_infos.count();
- for (int64_t i = 0; OB_SUCC(ret) && i < index_count; ++i) 获取所有索引表id信息
- orig_table_schema.has_lob_column() 获取lob 列相关信息
- uint64_t mtid = orig_table_schema.get_aux_lob_meta_tid(); 获取lob元数据表id
- uint64_t ptid = orig_table_schema.get_aux_lob_piece_tid();获取lob数据分片表id
- if (OB_INVALID_ID == mtid || OB_INVALID_ID == ptid) LOG_WARN “Expect meta tid and piece tid valid”; 校验以上两个id需要均有效
- schema_service->get_batch_table_schema 获取所有的表schema信息
- for (int64_t i = 0; OB_SUCC(ret) && i < tmp_table_schema_count; ++i) 在所有表schema 中进行后续验证
- if (orig_database_id != database_id) 校验原表数据id和索引表数据库id
- tmp_schema->is_index_table() && !is_available_index_status(index_status) 验证索引表的状态,不可用报错 LOG_WARN “index table’s index status is not available”
总结一下:
在执行truncate table 操作时,有很多程序本身的校验和对象的相关校验,程序类校验先不用关注,只关注对象类校验即可。
- 检查事务,数据库,表是否都存在
- 检查锁表加 EXCLUSIVE 排他锁 是否成功
- 检查兼容版本即最小数据版本是否小于4.1.0.0,如果小于是不支持的数据版本
- 检查这个表是否包含自增列,且最小数据版本小于4.2.0.0,tenant’s data version is below 4.2.0.0, truncate table with autoinc column is OB_NOT_SUPPORTED
- 检查表 schema 信息相关参数是否合法
- 检查 lob 元数据id 和 lob 数据分片id 是否有效
- 检查所有索引表的状态,不可用报错 LOG_WARN “index table’s index status is not available”
注:仅代表个人看法!