社区版4.1 日志中每十分钟出一次OB_SCHEMA_ERROR报错

请问这个旧版本:oceanbase-ce-4.1.0.0-100000202023040520.el7.x86_64.rpm ,是在哪里找到的,目前我们支持的rpm 版本在github上并没有支持这个

另外这个大概率就是build version和实际部署初始化部署时候版本没对上

select * from oceanbase.__all_server;

登录sys租户,用这个命令查下,看看build version和预期是否符合

请问这个旧版本:oceanbase-ce-4.1.0.0-100000202023040520.el7.x86_64.rpm ,是在哪里找到的,目前我们支持的rpm 版本在github上并没有支持这个

这个是在官网上下的,OceanBase社区版一键安装包:

github上也有这个包:

另外这个大概率就是build version和实际部署初始化部署时候版本没对上

不存在这个问题,select * from oceanbase.__all_server; 查过了,版本号是对的上的

整理一下:

  1. 目前没有报错那一套1-1架构的,开始是安装的社区版一键安装包里的rpm包【上传到了OCP】即我上面说的【旧版本:oceanbase-ce-4.1.0.0-100000202023040520.el7.x86_64.rpm】, 后来也是用OCP升级到了 github上12号新发布的 4.1.0_bp1版本,即上面说的新版本【新版本:oceanbase-ce-4.1.0.0-101000022023050809.el7.x86_64.rpm】

  2. 因为看到升级新版本就没出现报错,所以我就重新部署了一套测试集群,直接用的就是【新版本:oceanbase-ce-4.1.0.0-101000022023050809.el7.x86_64.rpm】了,但是还是有错误,今天上传的日志都是来自于这套新的测试集群了。
    下面是这套新集群的 oceanbase.__all_server 信息:

> 
> obclient [oceanbase]> select * from oceanbase.__all_server;
> +----------------------------+----------------------------+----------------+----------+----+-------+------------+-----------------+--------+-----------------------+-------------------------------------------------------------------------------------------+-----------+--------------------+--------------+----------------+
> | gmt_create                 | gmt_modified               | svr_ip         | svr_port | id | zone  | inner_port | with_rootserver | status | block_migrate_in_time | build_version                                                                             | stop_time | start_service_time | first_sessid | with_partition |
> +----------------------------+----------------------------+----------------+----------+----+-------+------------+-----------------+--------+-----------------------+-------------------------------------------------------------------------------------------+-----------+--------------------+--------------+----------------+
> | 2023-05-13 22:49:30.838203 | 2023-05-14 16:14:01.240729 | 192.168.51.233 |     2882 |  1 | zone1 |       2881 |               1 | ACTIVE |                     0 | 4.1.0.0_101000022023050809-bd50a54ac52a82c5cd3b100781d5ed252822ccd0(May  8 2023 09:32:12) |         0 |   1684052016296458 |            0 |              1 |
> | 2023-05-13 22:49:30.817980 | 2023-05-14 16:14:52.546574 | 192.168.51.234 |     2882 |  2 | zone2 |       2881 |               0 | ACTIVE |                     0 | 4.1.0.0_101000022023050809-bd50a54ac52a82c5cd3b100781d5ed252822ccd0(May  8 2023 09:32:12) |         0 |   1684052090564492 |            0 |              1 |
> | 2023-05-13 22:49:30.818740 | 2023-05-14 16:15:56.145328 | 192.168.51.235 |     2882 |  3 | zone3 |       2881 |               0 | ACTIVE |                     0 | 4.1.0.0_101000022023050809-bd50a54ac52a82c5cd3b100781d5ed252822ccd0(May  8 2023 09:32:12) |         0 |   1684052156061723 |            0 |              1 |
> +----------------------------+----------------------------+----------------+----------+----+-------+------------+-----------------+--------+-----------------------+-------------------------------------------------------------------------------------------+-----------+--------------------+--------------+----------------+
> 3 rows in set (0.001 sec)

重新部署时旧集群有destroy干净么

ocp上直接删除的,然后我还去每台机器上,把日志和数据目录 和 /home/admin/目录都清空了

要是怀疑是这个原因的话,我现在再重新部署一次看看

嗯,重新部署前先去observer部署的机器上确认相关的data目录等都删干净了

还是一样,新建租户后,ERROR日志就来了

  1. 部署的是4.1, 按之前的经验,升级到4.1_bp1, 升级过程中也会有一个4029报错:
[2023-05-15 16:38:05] ERROR do_upgrade_post.py:104 run error
Traceback (most recent call last):
  File "/tmp/rpms/extract/oceanbase-ce-4.1.0.0-101000022023050809.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_post_extract_files_2023_05_15_16_37_37_678817_DUdnN7Uj/do_upgrade_post.py", line 82, in do_upgrade
    tenant_upgrade_action.do_upgrade(conn, cur, timeout, my_user, my_passwd)
  File "/tmp/rpms/extract/oceanbase-ce-4.1.0.0-101000022023050809.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_post_extract_files_2023_05_15_16_37_37_678817_DUdnN7Uj/tenant_upgrade_action.py", line 31, in do_upgrade
    cur.execute(sql)
  File "/home/admin/ocp_agent/site-packages/mysql/connector/cursor.py", line 569, in execute
    self._handle_result(self._connection.cmd_query(stmt))
  File "/home/admin/ocp_agent/site-packages/mysql/connector/connection.py", line 599, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
  File "/home/admin/ocp_agent/site-packages/mysql/connector/connection.py", line 487, in _handle_result
    raise errors.get_exception(packet)
DatabaseError: 4029 (HY000): View 'information_schema.TABLE_CONSTRAINTS' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
[2023-05-15 16:38:05] INFO do_upgrade_post.py:43 ==================================================================================
[2023-05-15 16:38:05] INFO do_upgrade_post.py:44 ============================== STATISTICS BEGIN ==================================
[2023-05-15 16:38:05] INFO do_upgrade_post.py:45 ==================================================================
================
[2023-05-15 16:38:05] INFO do_upgrade_post.py:46 succeed run sql(except sql of special actions): 

alter system end rolling upgrade;

[2023-05-15 16:38:05] INFO do_upgrade_post.py:47 commited sql(except sql of special actions): 

alter system end rolling upgrade;

[2023-05-15 16:38:05] INFO do_upgrade_post.py:48 ==================================================================================
[2023-05-15 16:38:05] INFO do_upgrade_post.py:49 =============================== STATISTICS END ===================================
[2023-05-15 16:38:05] INFO do_upgrade_post.py:50 ==================================================================================
[2023-05-15 16:38:05] ERROR do_upgrade_post.py:114 connection error
Traceback (most recent call last):
  File "/tmp/rpms/extract/oceanbase-ce-4.1.0.0-101000022023050809.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_post_extract_files_2023_05_15_16_37_37_678817_DUdnN7Uj/do_upgrade_post.py", line 105, in do_upgrade
    raise e
DatabaseError: 4029 (HY000): View 'information_schema.TABLE_CONSTRAINTS' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
[2023-05-15 16:38:05] ERROR do_upgrade_post.py:153 mysql connctor error
Traceback (most recent call last):
  File "/tmp/rpms/extract/oceanbase-ce-4.1.0.0-101000022023050809.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_post_extract_files_2023_05_15_16_37_37_678817_DUdnN7Uj/do_upgrade_post.py", line 151, in do_upgrade_by_argv
    do_upgrade(host, port, user, password, timeout, module_set, upgrade_params)
  File "/tmp/rpms/extract/oceanbase-ce-4.1.0.0-101000022023050809.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_post_extract_files_2023_05_15_16_37_37_678817_DUdnN7Uj/do_upgrade_post.py", line 115, in do_upgrade
    raise e
DatabaseError: 4029 (HY000): View 'information_schema.TABLE_CONSTRAINTS' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
[2023-05-15 16:38:05] ERROR do_upgrade_post.py
:154 run error, maybe you can reference rollback_sql_post.txt to rollback it
Traceback (most recent call last):
  File "/tmp/rpms/extract/oceanbase-ce-4.1.0.0-101000022023050809.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_post_extract_files_2023_05_15_16_37_37_678817_DUdnN7Uj/do_upgrade_post.py", line 151, in do_upgrade_by_argv
    do_upgrade(host, port, user, password, timeout, module_set, upgrade_params)
  File "/tmp/rpms/extract/oceanbase-ce-4.1.0.0-101000022023050809.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_post_extract_files_2023_05_15_16_37_37_678817_DUdnN7Uj/do_upgrade_post.py", line 115, in do_upgrade
    raise e
DatabaseError: 4029 (HY000): View 'information_schema.TABLE_CONSTRAINTS' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
Traceback (most recent call last):
  File "/tmp/rpms/extract/oceanbase-ce-4.1.0.0-101000022023050809.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_post.py", line 2763, in <module>
    do_upgrade_by_argv(sys.argv[1:])
  File "/tmp/rpms/extract/oceanbase-ce-4.1.0.0-101000022023050809.el7.x86_64.rpm/home/admin/oceanbase/etc/upgrade_post_extract_files_2023_05_15_16_37_37_678817_DUdnN7Uj/do_upgrade_post.py", line 155, in do_upgrade_by_argv
    raise e
mysql.connector.errors.DatabaseError: 4029 (HY000): View 'information_schema.TABLE_CONSTRAINTS' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
, error: exit status 1

2023-05-15 16:38:07.782  WARN 13 --- [pool-manual-subtask-executor15,d13e1d0995224f97,11b4668b80f4] c.o.o.c.t.engine.runner.RunnerFactory    : Execute task failed, subtask=SubtaskInstanceOverview{id=101977, name=Execute upgrade post script, state=FAILED, operation=EXECUTE, className=com.oceanbase.ocp.service.task.business.cluster.ExecUpgradePostScriptTask, seriesId=56, startTime=2023-05-15T16:37:36.938+08:00, endTime=null}, failedMessage=An unknown error has occurred. Cause: agent task is failed. Error message: null. Contact the administrator.

com.oceanbase.ocp.core.exception.UnexpectedException: [OCP UnexpectedException]: status=500 INTERNAL_SERVER_ERROR, errorCode=COMMON_UNEXPECTED, args=agent task is failed,null
	at com.oceanbase.ocp.core.i18n.ErrorCodes$Kind$10.exception(ErrorCodes.java:1260) ~[ocp-core-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.core.i18n.ErrorCodes.exception(ErrorCodes.java:1115) ~[ocp-core-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.core.util.ExceptionUtils.require(ExceptionUtils.java:154) ~[ocp-core-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.service.task.util.AgentAsyncTaskHelper.checkSuccess(AgentAsyncTaskHelper.java:219) ~[ocp-service-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.service.task.util.AgentAsyncTaskHelper.waitForExecuteFinish(AgentAsyncTaskHelper.java:165) ~[ocp-service-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.service.task.util.AgentAsyncTaskHelper.runUpgradeScript(AgentAsyncTaskHelper.java:128) ~[ocp-service-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.service.task.business.cluster.helper.UpgradeTaskHelper.runScript(UpgradeTaskHelper.java:140) ~[ocp-service-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.service.task.business.cluster.helper.UpgradeTaskHelper.runScript(UpgradeTaskHelper.java:92) ~[ocp-service-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.service.task.business.cluster.helper.UpgradeTaskHelper.runScript(UpgradeTaskHelper.java:65) ~[ocp-service-4
.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.service.task.business.cluster.ExecUpgradePostScriptTask.run(ExecUpgradePostScriptTask.java:64) ~[ocp-service-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.execute(JavaSubtaskRunner.java:60) ~[ocp-core-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:32) ~[ocp-core-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26) ~[ocp-core-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:111) ~[ocp-core-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.redirectOutputIfNotSysSchedule(RunnerFactory.java:183) ~[ocp-core-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.run(RunnerFactory.java:101) ~[ocp-core-4.0.3-20230301.jar!/:4.0.3-20230301]
	at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.ReadySubtaskWorker.lambda$null$1(ReadySubtaskWorker.java:127) ~[ocp-core-4.0.3-20230301.jar!/:4.0.3-20230301]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_312]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_312]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_312]
	at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_312]


Set state for subtask: 101977, operation:EXECUTE, state: FAILED

  1. 可以[如上图所示]手动跳过这一步,升级也能成功,升级完成后,OB_SCHEMA_ERROR 报错日志就不会出现了

理论上升级会校正数据库schema版本,升级后不报错是符合预期的,但你这个升级过程中也报错是不符合预期的,能捞到相关日志么, 另外你能直接部署4.1_bp1么?

试过好几次了,直接部署4.1_bp1,也是有的, 上面贴子是有说过了,相关的日志也上传过了

3.新部署的这套1-1-1, 在ocp上做了一下【停止集群】 过了五分钟之后再 【启动集群】 的操作后,这个报错日志变成每过 6秒 就出一次了【之前是10分钟出来一次】

你的日志只有单机的日志,没能看出来啥有效信息,理论上升级完是正常的,说明升级前的schema版本信息有问题

就简单点,按你的说法,现在一个完全干净的环境,然后你部署一个目前最新版本的rpm包,然后不要有任何操作,应该是不报错的,但是建完租户会就会开始报错是吧 ? 能否麻烦你把你全部的操作流程,以及多台机器上的全部日志都上传的

你升级完成后 这张表 正常吗?information_schema.TABLE_CONSTRAINTS可能有问题的

还有,你的ocp初始部署是什么时候 ? 当时部署的observer版本是多少还有印象么

这个表是有问题的
obclient [information_schema]> select * from TABLE_CONSTRAINTS;
ERROR 1356 (42S22): View ‘information_schema.TABLE_CONSTRAINTS’ references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them

社区版的,OCP版本号: 4.0.3-20230301, 也是前不久刚部署的哦,发现有这个问题后,在5月9号晚上23点左右,我把所有环境都清空了之后,重新部署了一次ocp, 当时上传的第一个observer版本就是 上面说的旧版本:oceanbase-ce-4.1.0.0-100000202023040520.el7.x86_64.rpm

然后5月12号发现github上有新版本,就又上传了个
新版本:oceanbase-ce-4.1.0.0-101000022023050809.el7.x86_64.rpm

为什么不用最新的observer rpm包?

这样吧,我们再做一次尝试,把ocp环境也删除干净,然后部署最新的rpm包,部署后不要做什么升级 信服之类的操作,观察是否出现报错,出现报错是什么时候出现的, 另外你一直提到建租户后报错 ,这个操作是你手动的还是部署过程自动的,以及你操作的时间点

总感觉你哪里的环境肯定没清理好,这个报错是根本原因就是没走升级操作,直接替换binary,也就是说数据库meta info中的版本仍是旧版本的,但用了更新版本的observer binary