load data的时候报错

ob4.3.4
在load data的时候报错 日志显示是内存的问题



observer.log.zip (6.2 MB)

先用常规方法判断一下。
看看租户内存资源规格,转储参数等。

然后发一下导入方法。如果是普通的加载,使用的是 memstore 内存。如果是旁路导入,使用的是kvcache。二者优化思路不一样。

load data local infile 'x' replace into table sjzt_ods.ods_sjgx_t_corp2 character
set
  utf8mb4 fields terminated by ',' enclosed by '"' escaped by '"' lines terminated by '\n' (
    @col1,
    @col2,
    @col3,
    @col4,
    @col5,
    @col6,
    @col7,
    @col8,
    @col9,
    @col10,
    @col11,
    @col12,
    @col13,
    @col14,
    @col15,
    @col16,
    @col17,
    @col18,
    @col19,
    @col20,
    @col21,
    @col22,
    @col23,
    @col24,
    @col25,
    @col26,
    @col27,
    @col28,
    @col29,
    @col30,
    @col31,
    @col32,
    @col33,
    @col34,
    @col35,
    @col36,
    @col37,
    @col38,
    @col39,
    @col40,
    @col41,
    @col42,
    @col43,
    @col44,
    @col45,
    @col46,
    @col47,
    @col48,
    @col49,
    @col50,
    @col51,
    @col52,
    @col53
  )
set
  CORP_OPERATE = nullif(@col1, ''),
  FARE_SCOPE = nullif(@col2, ''),
  SURROGATE_SIGN = nullif(@col3, ''),
  CREATE_DATE = nullif(@col4, ''),
  YHK_YEAR = nullif(@col5, ''),
  ABUITEM = nullif(@col6, ''),
  ORD = nullif(@col7, ''),
  UNI_SCID = nullif(@col8, ''),
  FARE_TERM_END = nullif(@col9, ''),
  ZJ_ECON_KIND = nullif(@col10, ''),
  ORG = nullif(@col11, ''),
  IS_ZZDJ_PRINT = nullif(@col12, ''),
  REVOKE_DATE = nullif(@col13, ''),
  ADDR = nullif(@col14, ''),
  YEARCHK_DATE = nullif(@col15, ''),
  ID = nullif(@col16, ''),
  STATE = nullif(@col17, ''),
  START_DATE = nullif(@col18, ''),
  LAST_SEQ_ID = nullif(@col19, ''),
  UNIFY_CODE = nullif(@col20, ''),
  CBUITEM = nullif(@col21, ''),
  OLD_ORG_CANCEL_SIGN = nullif(@col22, ''),
  OPER_MAN_NAME = nullif(@col23, ''),
  FARE_PLACE = nullif(@col24, ''),
  OPER_MAN_IDENT_NO = nullif(@col25, ''),
  JG_ORG = nullif(@col26, ''),
  ECON_KIND = nullif(@col27, ''),
  OLD_REG_NO = nullif(@col28, ''),
  SDI_SYS_DATA_SOURCE = nullif(@col29, ''),
  CREDIT = nullif(@col30, ''),
  REG_NO = nullif(@col31, ''),
  CREATE_ORG = nullif(@col32, ''),
  MOVE_TO_ORG = nullif(@col33, ''),
  CORP_NAME = nullif(@col34, ''),
  IF_MULTADDR = nullif(@col35, ''),
  BELONG_ORG = nullif(@col36, ''),
  CONSIGN_ORG = nullif(@col37, ''),
  CHECK_DATE = nullif(@col38, ''),
  WRITEOFF_DATE = nullif(@col39, ''),
  IF_INTERNET = nullif(@col40, ''),
  SDI_SYS_COLLECT_TIME = nullif(@col41, ''),
  FARE_TERM_START = nullif(@col42, ''),
  NAME_REG_NO = nullif(@col43, ''),
  CANCEL_FLAG = nullif(@col44, ''),
  STATE_DATE = nullif(@col45, ''),
  CORP_STATUS = nullif(@col46, ''),
  REG_CAPI = nullif(@col47, ''),
  COME_OUT = nullif(@col48, ''),
  BELONG_DIST_ORG = nullif(@col49, ''),
  SEQ_ID = nullif(@col50, ''),
  ADMIT_MAIN = nullif(@col51, ''),
  BUSSINESS_NUM = nullif(@col52, ''),
  BELONG_TRADE = nullif(@col53, '')

执行一会observer就会断开 ocp告警信息有 合并失败

信息太少。就一步步来。尽量提供下面所有信息:

  1. 说明 OB 相关信息。
  • 版本
  • 机器资源规格
  • 租户资源规格
  1. OB 相关参数
  • 集群参数 memstore_limit_percentage
  • 租户参数 freeze_trigger_percentage , writing_throttling_trigger_percentage
  1. 不导入数据期间的 OB 合并一次。是否正常?
  2. load data 期间 OCP 里租户性能监控里的 MemStore 监控图

推荐:

  1. 用诊断工具obdiag 巡检一把:https://www.oceanbase.com/docs/common-obdiag-cn-1000000001768218

  2. 用诊断工具obdiag 捞一把集群的基本信息,https://www.oceanbase.com/docs/common-obdiag-cn-1000000001768263

将这两个执行的结果文件发出来,提高点交流效率

  1. ob 4.3.4 三节点集群 8核32G
    租户:

memstore_limit_percentage 50
freeze_trigger_percentage 20
writing_throttling_trigger_percentage 60

3.不导入时正常,未发现报错
4.

业务租户的参数调整一下:
writing_throttling_trigger_percentage → 90
freeze_trigger_percentage → 70

将租户的 primary_zone 设置为单zone,然后性能 图里 Zone 里选择那个 primary_zone。

再重新导入,发性能监控。

另外顺便发一下 【性能与SQL】下的租户 CPU 利用率。



memstore active 内存没上去(应该就是分配失败了),此时 trigger 内存阈值已经上去了。
不合常理。估计是那个子内存模块被限制住了。

你按上面 官方 靖顺 方法采集一下相关日志(那个采集信息全面),需要官方研发看了。

obdiag.zip (77.1 KB)

一直报错内存不足给租户扩容内存再试试导入

巡检中显示存在错误xfs need repair。可以校验磁盘是否出现问题