truncate table 报错,如何排查?

最近收到一个奇怪的报错,具体是什么呢?有十几张表需要truncate,在执行到最后一张表时,执行truncate table 时,立即返回 error code -4179,Operation not allowed now. truncate table 还能报错?翻阅官网,发现了相关的报错信息,[errcode=-4179] offline ddl is being executed, other ddl operations are not allowed. 报错原因是在执行offline ddl 时,只能一个一个执行,是串行的,不能并行执行,一个offline ddl 执行时,其他的ddl 只能等待。truncate table 属于 offline ddl,但是并没有并行执行truncate table的操作,大概率不是这个原因,但保留错误验证的情况。官方并没有提供如何查看正在执行 offline ddl 的方法和视图。

处理这个问题,怎么切入排查呢?首先复现一下这个报错,看是否依然存在,通过obclient和odc登录均可,复现问题后,登录primary zone 对应的节点,查看observer日志,过滤关键词对应租户ID和 truncate table,过滤出对应日志后,根据TRACE ID 查看整条日志链路,在日志中并未发现具体相关报错信息,只有error code -4179 execute rpc fail 和 rpc proxy truncate table failed. 在rootservice日志中没有相关日志信息。有报错并没有详细报错,但rpc 指向了rs 主节点地址,在observer 执行 ddl 时,并不是由本地observer 执行的,OB的执行机制是有个专门的服务叫rootservice服务进行统一调度执行,所以通过如下语句确认服务节点。

SELECT svr_ip,with_rootserver FROM oceanbase.DBA_OB_SERVERS;

由于rootservice同时只能有一个节点作为主,根据查到的结果,登录对应主RS节点,根据trace id 查看rootservice日志,看到了问题的具体原因:

[errcode=-4179] index table's index status is not available.

于是通过dba_indexes视图查看对应表下的索引,发现有个global的索引状态为unusable。

解决方法:

备份索引创建语句,删除索引,truncate table 成功,创建索引。

思考延伸:

以上是单一的情况,看看还有没有其他的情况会出现 truncate table ddl 失败。查看ob_ddl_service.cpp源文件。

int ObDDLService::new_truncate_table(const obrpc::ObTruncateTableArg &arg,
                                     obrpc::ObDDLRes &ddl_res,
                                     const SCN &frozen_version)
{
...
  if (OB_FAIL(check_inner_stat())) {
    LOG_WARN("variable is not init", KR(ret));
  } else if (OB_ISNULL(schema_service = schema_service_->get_schema_service())) {
    ret = OB_ERR_UNEXPECTED;
    LOG_WARN("schema_service must not null", KR(ret));
  } else if (OB_INVALID_ID == tenant_id || arg.database_name_.empty()
             || arg.table_name_.empty()) {
    ret = OB_INVALID_ARGUMENT;
    LOG_WARN("invalid argument", KR(ret), K(arg));
  //tenant share lock
  } else if (OB_FAIL(trans.start(sql_proxy_, tenant_id, fake_schema_version, with_snapshot))) {
    LOG_WARN("failed to start trans", KR(ret), K(tenant_id));
  } else if (OB_ISNULL(conn = dynamic_cast<observer::ObInnerSQLConnection *>
                       (trans.get_connection()))) {
    ret = OB_ERR_UNEXPECTED;
    LOG_WARN("trans conn is NULL", KR(ret), K(arg));
  // To verify the existence of database and table
  } else if (OB_FAIL(check_db_and_table_is_exist(arg, trans, database_id, table_id))) {
    LOG_WARN("failed to check database and table exist", KR(ret), K(arg.database_name_), K(arg.table_name_));
  } else {
    // table lock
    LOG_INFO("truncate cost after trans start and check_db_table_is_exist", KR(ret), "cost_ts", before_table_lock - start_time);
    // try lock
    if (OB_FAIL(ObInnerConnectionLockUtil::lock_table(tenant_id,
                                                      table_id,
                                                      EXCLUSIVE,
                                                      0,
                                                      conn))) {
      LOG_WARN("failed to lock table", KR(ret), K(arg.table_name_), K(table_id));
      // for error code convert
      if (OB_OP_NOT_ALLOW == ret) {
        ret = OB_SUCCESS;
        lock_table_not_allow = true;
      }
    }
    uint64_t compat_version = 0;
    int64_t after_table_lock = ObTimeUtility::current_time();
    LOG_INFO("truncate cost after lock table", KR(ret), "cost_ts", after_table_lock - before_table_lock);
    if (FAILEDx(schema_service->get_db_schema_from_inner_table(schema_status, database_id, database_schema_array, trans))){
      LOG_WARN("fail to get database schema", KR(ret), K(arg.database_name_), K(database_id));
    // get table full scehma
    } else if (OB_FAIL(schema_service->get_full_table_schema_from_inner_table(schema_status, table_id, orig_table_schema, allocator, trans))) {
      LOG_WARN("fail to get table schema", KR(ret), K(arg.table_name_), K(table_id));
    // in upgrade, check the data_version to prevent from executing wrong logical
    } else if (OB_FAIL(GET_MIN_DATA_VERSION(tenant_id, compat_version))) {
      LOG_WARN("get min data_version failed", KR(ret), K(tenant_id));
    } else if (compat_version < DATA_VERSION_4_1_0_0) {
      ret = OB_NOT_SUPPORTED;
      LOG_WARN("server state is not suppported when tenant's data version is below 4.1.0.0", KR(ret), K(compat_version));
      LOG_USER_ERROR(OB_NOT_SUPPORTED, "tenant's data version is below 4.1.0.0, truncate table is ");
    } else if (orig_table_schema.get_autoinc_column_id() != 0
              && compat_version < DATA_VERSION_4_2_0_0) {
      ret = OB_NOT_SUPPORTED;
      LOG_WARN("server state is not suppported to use_parallel_truncate when tenant's data version is below 4.2.0.0 "
                "and table has autoinc column", KR(ret), K(compat_version), K(tenant_id), K(arg.table_name_));
      LOG_USER_ERROR(OB_NOT_SUPPORTED, "tenant's data version is below 4.2.0.0, truncate table with autoinc column is ");
    // To verify the args are legal
    } else if (OB_FAIL(check_table_schema_is_legal(database_schema_array.at(0), orig_table_schema, arg.foreign_key_checks_, trans))) {
      LOG_WARN("failed to check table schema is legal",
              KR(ret), K(arg.table_name_), K(table_id), K(orig_table_schema.get_schema_version()));
    } else if (lock_table_not_allow) {
      ret = OB_OP_NOT_ALLOW;
      LOG_WARN("fail to lock table", KR(ret), K(orig_table_schema));
    }// get index and lob schema
    else if (OB_FAIL(get_index_lob_table_schema(orig_table_schema, schema_status,
                                            table_schema_array, allocator, trans))) {
      LOG_WARN("fail to get index or lob table schema",
              KR(ret), K(arg.table_name_), K(table_id), K(orig_table_schema.get_schema_version()));
    }
    int64_t after_get_schema =  ObTimeUtility::current_time();
    LOG_INFO("truncate cost after get schema and check legal",
            KR(ret), "cost_ts", after_get_schema - after_table_lock);
    if (FAILEDx(new_truncate_table_in_trans(table_schema_array, trans, &arg.ddl_stmt_str_, ddl_res))) {
      LOG_WARN("truncate table in trans failed",
              KR(ret), K(arg.table_name_), K(table_id), K(orig_table_schema.get_schema_version()));
    }
    int64_t finish_truncate_table = ObTimeUtility::current_time();
    LOG_INFO("truncate cost after finish truncate", KR(ret), K(tenant_id), K(table_id), "cost_ts", finish_truncate_table - start_time);
  }
  return ret;
}
  • check_inner_stat() 检查ObDDLService类内部成员是否都正常初始化
  • OB_ISNULL(schema_service = schema_service_->get_schema_service()) 检查空指针,schema_service must not null
  • OB_INVALID_ID == tenant_id || arg.database_name_.empty() || arg.table_name_.empty() 检查租户ID 数据库名称 表名 参数是否为空
  • trans.start(sql_proxy_, tenant_id, fake_schema_version, with_snapshot) 开启事务,tenant share lock
  • conn = dynamic_cast<observer::ObInnerSQLConnection *>(trans.get_connection() 获取数据库底层连接ObInnerSQLConnection
  • check_db_and_table_is_exist(arg, trans, database_id, table_id) 检查事务,数据库,表是否都存在。
  • ObInnerConnectionLockUtil::lock_table(tenant_id, table_id,EXCLUSIVE,0,conn) 锁表加EXCLUSIVE 排他锁。
  • schema_service->get_db_schema_from_inner_table 从内部表获取 db schema 信息
  • schema_service->get_full_table_schema_from_inner_table 从内部表获取完整表schema 信息
  • GET_MIN_DATA_VERSION(tenant_id, compat_version) 获取最小数据版本
  • compat_version < DATA_VERSION_4_1_0_0 检查兼容版本即最小数据版本是否小于4.1.0.0,如果小于是不支持的数据版本。
  • orig_table_schema.get_autoinc_column_id() != 0 && compat_version < DATA_VERSION_4_2_0_0 检查这个表是否包含自增列,且最小数据版本小于4.2.0.0,tenant’s data version is below 4.2.0.0, truncate table with autoinc column is OB_NOT_SUPPORTED
  • check_table_schema_is_legal 验证表schema 信息相关参数是否合法
  • get_index_lob_table_schema 获取所有索引和lob相关表schema 信息
  • new_truncate_table_in_trans 实际执行truncate 操作

由于报错在index,所以特殊查看一下 get_index_lob_table_schema

int ObDDLService::get_index_lob_table_schema(const ObTableSchema &orig_table_schema,
                                             const ObRefreshSchemaStatus &schema_status,
                                             common::ObArray<const ObTableSchema*> &table_schemas,
                                             ObArenaAllocator &allocator,
                                             ObMySQLTransaction &trans)
{
...
  } else if (OB_FAIL(table_schemas.push_back(&orig_table_schema))) {
    LOG_WARN("fail to push back orign table schema", KR(ret), K(table_name), K(schema_version));
  }// get index table id
  else if (OB_FAIL(orig_table_schema.get_simple_index_infos(simple_index_infos))) {
    LOG_WARN("get simple_index_infos failed", KR(ret), K(table_name), K(schema_version));
  } else {
    ObIndexType index_type = INDEX_TYPE_IS_NOT;
    ObTableType table_type = MAX_TABLE_TYPE;
    // get all index table id
    int64_t index_count = simple_index_infos.count();
    for (int64_t i = 0; OB_SUCC(ret) && i < index_count; ++i) {
      index_type = simple_index_infos.at(i).index_type_;
      table_type = simple_index_infos.at(i).table_type_;
      if ((USER_INDEX == table_type) && index_has_tablet(index_type)) {
        if (OB_FAIL(table_ids.push_back(simple_index_infos.at(i).table_id_))) {
          LOG_WARN("failed to push index id to index_ids",
                  KR(ret), K(i), K(simple_index_infos.at(i).table_id_), K(table_name), K(schema_version));
        }
      }
    }
  }
  // get lob table id
  if (OB_FAIL(ret)) {
  } else if (orig_table_schema.has_lob_column()) {
    uint64_t mtid = orig_table_schema.get_aux_lob_meta_tid();
    uint64_t ptid = orig_table_schema.get_aux_lob_piece_tid();
    if (OB_INVALID_ID == mtid || OB_INVALID_ID == ptid) {
      ret = OB_ERR_UNEXPECTED;
      LOG_WARN("Expect meta tid and piece tid valid",
              KR(ret), K(table_name), K(schema_version), K(mtid), K(ptid));
    } else if (OB_FAIL(table_ids.push_back(mtid))) {
      LOG_WARN("fail to push back lob meta tid", KR(ret),K(table_name), K(schema_version), K(mtid));
    } else if (OB_FAIL(table_ids.push_back(ptid))) {
      LOG_WARN("fail to push back lob piece tid", KR(ret), K(table_name), K(schema_version), K(ptid));
    }
  }
  if (OB_SUCC(ret) && 0 != table_ids.count()) {
    // this batch impl lost foreign key and trigger etc.
    if (OB_FAIL(schema_service->get_batch_table_schema(schema_status, schema_version, table_ids,
                                                       trans, allocator, tmp_table_schemas))) {
        LOG_WARN("failed to get batch table schema", KR(ret), K(table_name), K(schema_version));
    } else {
      const ObTableSchema *tmp_schema = NULL;
      ObIndexStatus index_status = INDEX_STATUS_AVAILABLE;
      int64_t tmp_table_schema_count = tmp_table_schemas.count();

      for (int64_t i = 0; OB_SUCC(ret) && i < tmp_table_schema_count; ++i) {
        tmp_schema = tmp_table_schemas.at(i);
        if (OB_ISNULL(tmp_schema)) {
          ret = OB_ERR_UNEXPECTED;
          LOG_WARN("tmp schema is NULL", KR(ret));
        } else {
          table_name = tmp_schema->get_table_name();
          database_id = tmp_schema->get_database_id();
          index_status = tmp_schema->get_index_status();
          schema_version = tmp_schema->get_schema_version();
          if (orig_database_id != database_id) {
            ret = OB_ERR_UNEXPECTED;
            LOG_WARN("orign table database_id is not equal to index table database_id",
                    KR(ret), K(orig_database_id), K(database_id), K(table_name), K(schema_version));
          } else if (tmp_schema->is_index_table() && !is_available_index_status(index_status)) {
            ret = OB_OP_NOT_ALLOW;
            LOG_WARN("index table's index status is not available",
                    KR(ret), K(table_name), K(database_id), K(schema_version));
          } else if (OB_FAIL(table_schemas.push_back(tmp_schema))) {
            LOG_WARN("push back schema failed", KR(ret), K(table_name), K(database_id), K(schema_version));
          }
        }
      }
    }
  }
  return ret;
}
  • table_schemas.push_back(&orig_table_schema) 获取主表schema
  • orig_table_schema.get_simple_index_infos(simple_index_infos)获取简要索引信息
  • int64_t index_count = simple_index_infos.count();
  • for (int64_t i = 0; OB_SUCC(ret) && i < index_count; ++i) 获取所有索引表id信息
  • orig_table_schema.has_lob_column() 获取lob 列相关信息
  • uint64_t mtid = orig_table_schema.get_aux_lob_meta_tid(); 获取lob元数据表id
  • uint64_t ptid = orig_table_schema.get_aux_lob_piece_tid();获取lob数据分片表id
  • if (OB_INVALID_ID == mtid || OB_INVALID_ID == ptid) LOG_WARN “Expect meta tid and piece tid valid”; 校验以上两个id需要均有效
  • schema_service->get_batch_table_schema 获取所有的表schema信息
  • for (int64_t i = 0; OB_SUCC(ret) && i < tmp_table_schema_count; ++i) 在所有表schema 中进行后续验证
  • if (orig_database_id != database_id) 校验原表数据id和索引表数据库id
  • tmp_schema->is_index_table() && !is_available_index_status(index_status) 验证索引表的状态,不可用报错 LOG_WARN “index table’s index status is not available”

总结一下:

在执行truncate table 操作时,有很多程序本身的校验和对象的相关校验,程序类校验先不用关注,只关注对象类校验即可。

  • 检查事务,数据库,表是否都存在
  • 检查锁表加 EXCLUSIVE 排他锁 是否成功
  • 检查兼容版本即最小数据版本是否小于4.1.0.0,如果小于是不支持的数据版本
  • 检查这个表是否包含自增列,且最小数据版本小于4.2.0.0,tenant’s data version is below 4.2.0.0, truncate table with autoinc column is OB_NOT_SUPPORTED
  • 检查表 schema 信息相关参数是否合法
  • 检查 lob 元数据id 和 lob 数据分片id 是否有效
  • 检查所有索引表的状态,不可用报错 LOG_WARN “index table’s index status is not available”

注:仅代表个人看法!

1 个赞

oceanbase-4.2.1_CE_BP8

表面看上去好像是内核权限问题啊

学到了~

学习

那就太高深了 :face_with_hand_over_mouth: