创建备租户失败问题

【 使用环境 】生产环境
【 OB or 其他组件 】
【 使用版本 】4.2.1.6
【问题描述】清晰明确描述问题
【复现路径】问题出现前后相关操作

到Create standby tenant 时候报错

是备租户,给STANDBYRO 用户,随机分配的密码是 *72JO+bSpT 有特殊字符,然后报错。

当前解决办法:
去OCP元数据库,task_instance表,根据task id 找到,用update 修改RUNNING 为FAILED,如
update ocp.task_instance set state = ‘FAILED’ where id=14044309;

然后,在ocp任务,就可以回滚取消。

再次发起的话,分配密码还是 *72JO+bSpT ,要怎么绕过处理呢?

1 个赞
  1. “是备租户,给STANDBYRO 用户,随机分配的密码是 *72JO+bSpT 有特殊字符,然后报错。
    当前解决办法:
    去OCP元数据库,task_instance表,根据task id 找到,用update 修改RUNNING 为FAILED,如
    update ocp.task_instance set state = ‘FAILED’ where id=14044309;”

报错后任务状态仍然是running吗?

2.方便脱敏下关键信息 发下完整的任务日志吗?

3.OCP版本也麻烦发下

1 个赞

报错后任务状态仍然是running的,通过OCP_metadb里的 ocp .task_instance 改成FAILED,ocp页面,才有了回滚。

版本是4.3.0-20240711151555

任务日志:
2024-10-11 15:55:07.962 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.o.c.t.e.runner.JavaSubtaskRunner : Run subtask, id=17121623, context=Context{parallelIdx=-1, stringMap={tenant_id=7000464, tenant_name=XXXXXX, ob_tenant_parameter_map=, task_instance_id=14044309, task_operation=execute, create_standby_tenant_param_json={“enableArbitration”:false,“logTransportMode”:“NETWORK”,“mode”:“ORACLE”,“name”:“XXXXXX”,“parameters”:[],“primaryRootPassword”:"",“primaryTenantId”:3000104,“primaryZone”:“zone1”,“restoreTenant”:false,“zones”:[{“name”:“zone1”,“replicaType”:“FULL”,“resourcePool”:{“unitCount”:1,“unitSpecName”:“UIT1”}}]}, target_tenant_status=NORMAL, resource_pool_list_json=[{“id”:1346,“name”:“pool_XXXXXX_zone1_sms”,“unitConfig”:{“iopsWeight”:2,“logDiskSize”:64,“logDiskSizeByte”:68719476736,“maxCpuCoreCount”:2.00,“maxIops”:0,“maxMemoryByte”:17179869184,“maxMemorySize”:16,“minCpuCoreCount”:2.00,“minIops”:0,“minMemoryByte”:17179869184,“minMemorySize”:16},“unitCount”:1,“zoneList”:[“zone1”]}], primary_tenant_id=3000104, cluster_id=4, system_variable_map=, standby_tenant_readonly_password=, latest_execution_start_time=2024-10-11T15:55:07.870+08:00, sub_task_instance_name=Create standby tenant, standby_tenant_readonly_username=STANDBYRO, sub_task_instance_id=17121623}, listMap={}}, executor=XXXXXX

2024-10-11 15:55:08.003 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.o.s.t.b.t.CreateStandbyTenantTask : begin to set optimize parameter standby_db_fetch_log_rpc_timeout:1200s

2024-10-11 15:55:08.013 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: set ob_query_timeout = ?, args: [10000000]

2024-10-11 15:55:08.017 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: SHOW PARAMETERS WHERE scope = ‘TENANT’ AND name LIKE ?, args: [standby_db_fetch_log_rpc_timeout]

2024-10-11 15:55:08.033 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: set ob_query_timeout = ?, args: [10000000]

2024-10-11 15:55:08.038 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: ALTER SYSTEM SET standby_db_fetch_log_rpc_timeout = ?, args: [1200s]

2024-10-11 15:55:08.060 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: set ob_query_timeout = ?, args: [10000000]

2024-10-11 15:55:08.065 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: SHOW PARAMETERS WHERE scope = ‘TENANT’ AND name LIKE ?, args: [standby_db_fetch_log_rpc_timeout]

2024-10-11 15:55:08.085 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.common.lang.pattern.Retry : wait for 3 seconds

2024-10-11 15:55:11.101 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: set ob_query_timeout = ?, args: [10000000]

2024-10-11 15:55:11.106 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: SHOW PARAMETERS WHERE scope = ‘TENANT’ AND name LIKE ?, args: [standby_db_fetch_log_rpc_timeout]

2024-10-11 15:55:11.120 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.o.s.t.b.t.CreateStandbyTenantTask : success to set optimize parameter standby_db_fetch_log_rpc_timeout:1200s, original value:15s

2024-10-11 15:55:11.123 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.o.s.t.b.t.CreateStandbyTenantTask : Begin to create standby tenant, param:CreateStandbyTenantParam(name=XXXXXX, primaryZone=zone1, primaryTenantId=3000104, primaryRootPassword=******, mode=ORACLE, primaryOcpClusterId=null, logTransportMode=NETWORK, restoreTenant=false, enableArbitration=false, backupConfig=null, zones=[CreateTenantParam.ZoneParam(name=zone1, replicaType=FULL, resourcePool=CreateTenantParam.PoolParam(unitSpecName=UIT1, unitCount=1))], parameters=[], description=null)

2024-10-11 15:55:11.158 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.o.s.o.o.f.ConnectPropertiesBuilder : get credential from obsdk context, clusterName=obcluster_dev, tenantName=XXXXXX, dbUser=SYS

2024-10-11 15:55:11.165 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: set ob_query_timeout = ?, args: [10000000]

2024-10-11 15:55:11.168 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: SELECT tenant_id, tenant_name, svr_ip, sql_port FROM DBA_OB_ACCESS_POINT

2024-10-11 15:55:11.223 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: set ob_query_timeout = ?, args: [10000000]

2024-10-11 15:55:11.226 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] sql: CREATE STANDBY TENANT XXXXXX LOG_RESTORE_SOURCE = "SERVICE=XXXXXX:2881 USER=STANDBYRO@XXXXXX password=xxx RESOURCE_POOL_LIST=(‘pool_XXXXXX_zone1_sms’), PRIMARY_ZONE=“zone1”, LOCALITY=“FULL@zone1”, args: []

2024-10-11 15:55:11.235 WARN 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] update failed, sql:[CREATE STANDBY TENANT XXXXXX LOG_RESTORE_SOURCE = "SERVICE=XXXXXX:2881 USER=STANDBYRO@XXXXXX password=xxx RESOURCE_POOL_LIST=(‘pool_XXXXXX_zone1_sms’), PRIMARY_ZONE=“zone1”, LOCALITY=“FULL@zone1”], error message:[PreparedStatementCallback; SQL [CREATE STANDBY TENANT XXXXXX LOG_RESTORE_SOURCE = “SERVICE=XXXXXX:2881 USER=STANDBYRO@XXXXXX PASSWORD=*72JO+bSpT” RESOURCE_POOL_LIST=(‘pool_XXXXXX_zone1_sms’), PRIMARY_ZONE=“zone1”, LOCALITY=“FULL@zone1”]; (conn=48275) Incorrect arguments to get primary connection; nested exception is java.sql.SQLTransientConnectionException: (conn=48275) Incorrect arguments to get primary connection]

2024-10-11 15:55:11.239 INFO 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.ocp.obsdk.connector.ConnectTemplate : Last Trace Info:[YB4285804CB5-0006208E7F619EBF-0-0]

2024-10-11 15:55:11.249 ERROR 8 — [manual-subtask-executor13,83886bebaf42b5cd,d54894c8fc302c6e] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : Incorrect arguments to get primary connection

java.sql.SQLException: Incorrect arguments to get primary connection
at com.oceanbase.jdbc.internal.protocol.AbstractQueryProtocol.readErrorPacket(AbstractQueryProtocol.java:2347)
at com.oceanbase.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:2210)
at com.oceanbase.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:2098)
at com.oceanbase.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:399)
at com.oceanbase.jdbc.JDBC4PreparedStatement.executeInternal(JDBC4PreparedStatement.java:247)
at com.oceanbase.jdbc.JDBC4PreparedStatement.execute(JDBC4PreparedStatement.java:170)
at com.oceanbase.jdbc.JDBC4PreparedStatement.executeUpdate(JDBC4PreparedStatement.java:204)
at com.alibaba.druid.pool.DruidPooledPreparedStatement.executeUpdate(DruidPooledPreparedStatement.java:255)
at org.springframework.jdbc.core.JdbcTemplate.lambda$update$2(JdbcTemplate.java:973)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:656)
at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:968)
at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:1023)
at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:1033)
at com.oceanbase.ocp.obsdk.connector.ConnectTemplate.updateInner(ConnectTemplate.java:293)
at com.oceanbase.ocp.obsdk.connector.ConnectTemplate.update(ConnectTemplate.java:264)
at com.oceanbase.ocp.obsdk.operator.tenant.MysqlTenantOperator.createStandbyTenant(MysqlTenantOperator.java:420)
at com.oceanbase.ocp.service.task.business.tenant.CreateStandbyTenantTask.run(CreateStandbyTenantTask.java:105)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.execute(JavaSubtaskRunner.java:64)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(Ja
vaSubtaskRunner.java:32)
at com.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26)
at com.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:76)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.doRun(SubtaskExecutor.java:206)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.redirectConsoleOutput(SubtaskExecutor.java:200)
at com.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.lambda$submit$2(SubtaskExecutor.java:137)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

Set state for subtask: 17121623, operation:EXECUTE, state: FAILED

1 个赞

是Oracle模式吗?

1 个赞

是的。

就别发那么一大堆的,企业版话术了。扎眼。

其他有遇到的小伙伴也可以看看,探讨,谢谢

1 个赞

企业版的bug我们确实处理不了,不好意思,还是建议你通过相应方式解决,当然社区小伙伴如有解决方案欢迎分享

恩呢,看看有木有遇到的小伙伴。
社区,企业,底层逻辑应该都是差不多的,也可以检查下社区版的,看有木有这个逻辑

1 个赞

把主库上那个用户密码修改掉就行。备租户还原后修改日志还原源语句中的密码,就可以继续还原。

看样子是创建备租户的时候出问题了,手动把备租户用随机分配的密码创建好
OCP上再跳过这个步骤继续往下走试试?

备租户,是卡在create tenant的步骤,所以,你的不能解决呢。

备租户在创建中,给主创建了STANDBYRO 用户,并分配了随机密码的。

此问题,已解决!!!
OCP版本:Version: 4.3.0-20240711151555
observer版本:oceanbase-4.2.1.6-106000022024042414.el7.x86_64.rpm

bug原因:再用redo日志的方式创建备租户时,首先,会为主租户创建STANDBYRO 用户(目的,是为了主备同步数据,就好比mysql的主从同步也需要创建一个repl的用户来进行同步),随机给STANDBYRO 分配了pwd,但这个密码有时会有特殊字符,比如分配"6I-YAgb$z0" 或者 “*72JO+bSpT”,在OCP进行备租户创建,到第五步(Create standby tenant)时,就会报错,信息如:
2024-10-12 10:36:06.585 WARN 8 — [manual-subtask-executor14,79d4b9c5f285bb90,2da084d844b225e6] c.o.ocp.obsdk.connector.ConnectTemplate : [obsdk] update failed, sql:[CREATE STANDBY TENANT XXXX LOG_RESTORE_SOURCE = "SERVICE=XXX.XXX.XXX.XXX:2881 USER=STANDBYRO@XXX password=xxx RESOURCE_POOL_LIST=(‘pool_XXX_zone1_vjs’), PRIMARY_ZONE=“zone1”, LOCALITY=“FULL@zone1”], error message:[PreparedStatementCallback; SQL [CREATE STANDBY TENANT XXX LOG_RESTORE_SOURCE = “SERVICE=XXX.XXX.XXX.XXX:2881 USER=STANDBYRO@XXX PASSWORD=6I-YAgb$z0” RESOURCE_POOL_LIST=(‘pool_XXX_zone1_vjs’), PRIMARY_ZONE=“zone1”, LOCALITY=“FULL@zone1”]; (conn=533279) Incorrect arguments to get primary connection; nested exception is java.sql.SQLTransientConnectionException: (conn=533279) Incorrect arguments to get primary connection]
2024-10-12 10:36:06.589 INFO 8 — [manual-subtask-executor14,79d4b9c5f285bb90,2da084d844b225e6] c.o.ocp.obsdk.connector.ConnectTemplate : Last Trace Info:[YB4285804CB5-0006208E7BB6F528-0-0]
2024-10-12 10:36:06.592 ERROR 8 — [manual-subtask-executor14,79d4b9c5f285bb90,2da084d844b225e6] c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : Incorrect arguments to get primary connection
java.sql.SQLException: Incorrect arguments to get primary connection
at com.oceanbase.jdbc.internal.protocol.AbstractQueryProtocol.readErrorPacket(AbstractQueryProtocol.java:2347)
at com.oceanbase.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:2210)
at com.oceanbase.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:2098)

然后,ocp此任务一直卡住,不能回滚,不能跳过。

解决办法:

1.去OCP元数据库,task_instance表,根据task id 找到,用update 修改RUNNING 为FAILED,如
update ocp .task_instance set state = ‘FAILED’ where id =14044309;
这样OCP任务页面,就有回滚的按键了,点击回滚。
2.去主租户修改STANDBYRO用户的密码
alter user STANDBYRO identified by “qqQQ11@@”;
3.去OCP凭证管理,找到主租户的STANDBYRO凭证,修改。
4.再次发起创建备租户,就可以了。

1 个赞