使用OCP创建集群失败

【 使用环境 】生产环境 or 测试环境
【 OB or 其他组件 】OB OCP
【 使用版本 】3.2.2
【问题描述】使用OCP创建集群时,Bootstrap ob步骤过不去。
#任务日志情况
2023-07-11 09:02:28.338 INFO 344 — [pool-manual-subtask-executor6,339c81ca562e4839,a9d1f90fc221] c.a.o.s.t.b.c.helper.ObServerTaskHelper : there exists server(s) still not accessible, hostIds=4,3,2

2023-07-11 09:02:28.344 INFO 344 — [pool-manual-subtask-executor6,339c81ca562e4839,a9d1f90fc221] com.alipay.ocp.common.pattern.Retry : wait for 5 seconds

2023-07-11 09:02:33.356 INFO 344 — [pool-manual-subtask-executor6,339c81ca562e4839,a9d1f90fc221] c.o.o.e.internal.template.HttpTemplate : POST request to agent, url:http://192.168.126.14:62888/api/v1/ob/observer/access, request body:AccessObServerProcessRequest(username=root), params:null

2023-07-11 09:02:33.383 INFO 344 — [pool-manual-subtask-executor6,339c81ca562e4839,a9d1f90fc221] .a.o.s.o.o.c.ClusterHostOperationService : server not accessible, hostId=4, exitCode=1, output=ERROR 2003 (HY000): Can’t connect to MySQL server on ‘127.1’ (111)

2023-07-11 09:02:33.394 INFO 344 — [pool-manual-subtask-executor6,339c81ca562e4839,a9d1f90fc221] c.o.o.e.internal.template.HttpTemplate : POST request to agent, url:http://192.168.126.13:62888/api/v1/ob/observer/access, request body:AccessObServerProcessRequest(username=root), params:null

2023-07-11 09:02:33.421 INFO 344 — [pool-manual-subtask-executor6,339c81ca562e4839,a9d1f90fc221] .a.o.s.o.o.c.ClusterHostOperationService : server not accessible, hostId=3, exitCode=1, output=ERROR 2003 (HY000): Can’t connect to MySQL server on ‘127.1’ (111)

2023-07-11 09:02:33.432 INFO 344 — [pool-manual-subtask-executor6,339c81ca562e4839,a9d1f90fc221] c.o.o.e.internal.template.HttpTemplate : POST request to agent, url:http://192.168.126.12:62888/api/v1/ob/observer/access, request body:AccessObServerProcessRequest(username=root), params:null

2023-07-11 09:02:33.460 INFO 344 — [pool-manual-subtask-executor6,339c81ca562e4839,a9d1f90fc221] .a.o.s.o.o.c.ClusterHostOperationService : server not accessible, hostId=2, exitCode=1, output=ERROR 2003 (HY000): Can’t connect to MySQL server on ‘127.1’ (111)

2023-07-11 09:02:33.466 INFO 344 — [pool-manual-subtask-executor6,339c81ca562e4839,a9d1f90fc221] c.a.o.s.t.b.c.helper.ObServerTaskHelper : there exists server(s) still not accessible, hostIds=4,3,2

2023-07-11 09:02:33.470 INFO 344 — [pool-manual-subtask-executor6,339c81ca562e4839,a9d1f90fc221] com.alipay.ocp.common.pattern.Retry : wait for 5 seconds

2023-07-11 09:02:38.474 INFO 344 — [pool-manual-subtask-executor6,339c81ca562e4839,a9d1f90fc221] c.a.o.c.m.t.model.SubtaskInstanceEntity : Set state for subtask: 2003658, current state: RUNNING, new state: FAILED

2023-07-11 09:02:38.481 WARN 344 — [pool-manual-subtask-executor6,339c81ca562e4839,a9d1f90fc221] c.a.o.c.t.engine.runner.RunnerFactory : Execute task failed, subtask=SubtaskInstanceEntity{id=2003658, name=Bootstrap ob, state=FAILED, operation=EXECUTE, className=com.alipay.ocp.service.task.business.cluster.BootStrapObTask, seriesId=45, startTime=2023-07-11T08:47:10.333+08:00, endTime=2023-07-11T09:02:38.481+08:00}

java.lang.RuntimeException: wait all ob server accessible timeout
at com.alipay.ocp.service.task.business.cluster.helper.ObServerTaskHelper.waitAllObServerAccessible(ObServerTaskHelper.java:49) ~[ocp-service-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.service.task.business.cluster.BootStrapObTask.run(BootStrapObTask.java:144) ~[ocp-service-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.metadb.task.model.SubtaskInstanceEntity.run(SubtaskInstanceEntity.java:221) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.JavaTaskRunner.doExecute(JavaTaskRunner.java:26) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.JavaTaskRunner.run(JavaTaskRunner.java:20) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:113) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.RunnerFactory.redirectOutputIfNotSysSchedule(RunnerFactory.java:185) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.RunnerFactory.run(RunnerFactory.java:102) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.coordinator.worker.subtask.ReadySubtaskWorker.lambda$submitTask$3(ReadySubtaskWorker.java:123) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_312]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_312]
at java.util.concurrent.T
hreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_312]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_312]
Caused by: java.lang.RuntimeException: result not match after try 180 times
at com.alipay.ocp.common.pattern.Retry.executeUntilWithLimit(Retry.java:80) ~[ocp-common-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.service.task.business.cluster.helper.ObServerTaskHelper.waitAllObServerAccessible(ObServerTaskHelper.java:43) ~[ocp-service-3.3.0-20220427.jar!/:3.3.0-20220427]
… 12 common frames omitted

#observer.log
[2023-07-11 09:59:50.252028] INFO [CLOG] ob_log_sliding_window.cpp:4834 [31081][0][Y0-0000000000000000-0-0] [lt=1] [dc=0] refresh tenant config success(last_get_tenant_config_time=1689040790251099, leader_max_unconfirmed_log_cnt=1500, cached_enable_compress=false, cached_flush_interval=0, partition_key={tid:1099511627985, partition_id:1, part_cnt:0}, tenant_id=1)
[2023-07-11 09:59:50.252033] INFO [CLOG] ob_log_sliding_window.cpp:4834 [31081][0][Y0-0000000000000000-0-0] [lt=1] [dc=0] refresh tenant config success(last_get_tenant_config_time=1689040790251099, leader_max_unconfirmed_log_cnt=1500, cached_enable_compress=false, cached_flush_interval=0, partition_key={tid:1099511627930, partition_id:11, part_cnt:0}, tenant_id=1)
[2023-07-11 09:59:50.256097] INFO [STORAGE] ob_pg_sstable_garbage_collector.cpp:183 [30628][0][Y0-0000000000000000-0-0] [lt=15] [dc=0] do one gc free sstable by queue(ret=0, free sstable cnt=0)
[2023-07-11 09:59:50.276601] INFO [STORAGE] ob_pg_sstable_garbage_collector.cpp:183 [30628][0][Y0-0000000000000000-0-0] [lt=9] [dc=0] do one gc free sstable by queue(ret=0, free sstable cnt=0)
[2023-07-11 09:59:50.281233] INFO [STORAGE.TRANS] ob_trans_part_ctx.cpp:751 [30984][0][YB42C0A87E0C-0006002C7260146C-0-0] [lt=17] [dc=0] update gts cache success(updated=false, context={ObDistTransCtx:{ObTransCtx:{this:0x7ff2e7a2bd50, ctx_type:2, trans_id:{hash:7598354652055538439, inc:37595, addr:“192.168.126.12:2882”, t:1689040790280906}, tenant_id:1, is_exiting:false, trans_type:0, is_readonly:false, trans_expired_time:1689040820281067, self:{tid:1099511628002, partition_id:0, part_cnt:0}, state:{prepare_version:-1, state:0}, cluster_version:12885032962, trans_need_wait_wrap:{receive_gts_ts:{mts:0}, need_wait_interval_us:0}, trans_param:[access_mode=1, type=2, isolation=1, magic=17361641481138401520, autocommit=1, consistency_type=0(CURRENT_READ), read_snapshot_type=2(PARTICIPANT_SNAPSHOT), cluster_version=12885032962, is_inner_trans=1], can_elr:false, uref:1073741825, ctx_create_time:1689040790280906, for_replay:false}, scheduler:“192.168.126.12:2882”, coordinator:{tid:18446744073709551615, partition_id:-1, part_idx:268435455, subpart_idx:268435455}, participants:[{tid:1099511628002, partition_id:0, part_cnt:0}], request_id:-1, timeout_task:{is_inited:true, is_registered:true, is_running:false, delay:10538439, ctx:0x7ff2e7a2bd50, bucket_idx:8483, run_ticket:337808160164, is_scheduled:true, prev:0x7ff3d8d30b80, next:0x7ff3d8d30b80}, xid:{gtrid_str:"", bqual_str:"", format_id:1, gtrid_str_.ptr():“data_size:0, data:”, bqual_str_.ptr():“data_size:0, data:”}}, snapshot_version:1689040790267750, local_trans_version:-1, submit_log_pending_count:0, submit_log_count:0, stmt_info:{sql_no:0, msg_seq:0, start_task_cnt:0, end_task_cnt:0, need_rollback:false, task_info:{tasks_:[]}}, global_trans_version:-1, redo_log_no:0, mutator_log_no:0, session_id:1, is_gts_waiting:false, part_trans_action:-1, timeout_task:{is_inited:true, is_registered:true, is_running:false, delay:10538439, ctx:0x7ff2e7a2bd50, bucket_idx:8483, run_ticket:337808160164, is_scheduled:true, prev:0x7ff3d8d30b80, next:0x7ff3d8d30b80}, batch_commit_trans:false, batch_commit_state:0, is_dup_table_trans:false, last_ask_scheduler_status_ts:1689040790280906, last_ask_scheduler_status_response_ts:1689040790280906, ctx_dependency_wrap:{prev_trans_arr:[], next_trans_arr:[], prev_trans_commit_count:0}, preassigned_log_meta:{log_id_:18446744073709551615, submit_timestamp_:-1, proposal_id_:{time_to_usec:-1, server:“0.0.0.0”}}, is_dup_table_prepare:false, dup_table_syncing_log_id:18446744073709551615, is_prepare_leader_revoke:false, is_local_trans:true, forbidden_sql_no:-1, is_dirty_:false, undo_status:{undo_action_arr:[]}, max_durable_seq_no:0, max_durable_log_ts:0, mt_ctx_.get_checksum_log_ts():0, is_changing_leader:false, has_trans_state_log:false, is_trans_state_sync_finished:false, status:0, same_leader_batch_partitions_count:0, is_hazardous_ctx:false, mt_ctx_.get_callback_count():0, in_xa_prepare_state:false, is_listener:false, last_replayed_redo_log_id:0, status:0, is_xa_trans_prepared:false, min_log_id:18446744073709551615, min_log_ts:9223372036854775807, is_redo_prepared:false})
[2023-07-11 09:59:50.284268] INFO [SERVER] ob_inner_sql_connection.cpp:1342 [30984][0][Y0-0000000000000000-0-0] [lt=16] [dc=0] execute write sql(ret=0, tenant_id=1, affected_rows=1, sql=" update _all_weak_read_service set min_version=1689040790189221, max_version=1689040790189221 where level_id = 0 and level_value = ‘’ and min_version = 1689040790125214 and max_version = 1689040790125214 “)
[2023-07-11 09:59:50.297620] INFO [CLOG] ob_clog_mgr.cpp:709 [30708][0][Y0-0000000000000000-0-0] [lt=2] [dc=0] begin run_check_log_file_collect_task(begin_ts=1689040790297613, collect_task_run_times=29)
[2023-07-11 09:59:50.297660] WARN [COMMON] get_file_id_range (ob_log_file_group.cpp:115) [30708][0][Y0-0000000000000000-0-0] [lt=13] [dc=0] max file does not exist(max_file_id=2, b_exist=false)
[2023-07-11 09:59:50.297702] INFO [COMMON] ob_log_file_group.cpp:134 [30708][0][Y0-0000000000000000-0-0] [lt=4] [dc=0] get min/max file id from IO(log_dir=”/home/admin/oceanbase/store/obtest1/ilog", min_file_id=1, max_file_id=1, lbt()=“0xc0d9c07 0x6bf28c6 0x6bf39a9 0x6c7e0c9 0x6bae8c7 0x6be8a87 0x6b438ba 0x6b40474 0x3c249eb 0x42aa088 0xbf749e4 0xbf74592 0xbf6b76f”)
[2023-07-11 09:59:50.297708] INFO [CLOG] ob_log_engine.cpp:2462 [30708][0][Y0-0000000000000000-0-0] [lt=3] [dc=0] need_freeze_based_on_used_space(is_need=false, clog_used_space=67108864, ilog_used_space=37748736, total_space=21003583488)
[2023-07-11 09:59:50.297756] INFO [CLOG] ob_log_engine.cpp:504 [30708][0][Y0-0000000000000000-0-0] [lt=2] [dc=0] clog update min using file id success(min_file_id=1, max_file_id=1, file_id=1)
[2023-07-11 09:59:50.297768] INFO [CLOG] ob_log_engine.cpp:2425 [30708][0][Y0-0000000000000000-0-0] [lt=3] [dc=0] set_need_freeze_partition_array(ret=0, need_freeze_partition_array
=[])
[2023-07-11 09:59:50.297771] INFO [CLOG] ob_clog_mgr.cpp:725 [30708][0][Y0-0000000000000000-0-0] [lt=2] [dc=0] finish run_check_log_file_collect_task(ret=0, begin_ts=1689040790297613, end_ts=1689040790297770, task_cost_time=157, collect_task_run_times=29)
[2023-07-11 09:59:50.297774] INFO [CLOG] ob_clog_mgr.cpp:1054 [30708][0][Y0-0000000000000000-0-0] [lt=3] [dc=0] run_check_log_file_collect_task success
[2023-07-11 09:59:50.297832] INFO [STORAGE] ob_pg_sstable_garbage_collector.cpp:183 [30628][0][Y0-0000000000000000-0-0] [lt=11] [dc=0] do one gc free sstable by queue(ret=0, free sstable cnt=0)
[2023-07-11 09:59:50.319286] INFO [STORAGE] ob_pg_sstable_garbage_collector.cpp:183 [30628][0][Y0-0000000000000000-0-0] [lt=3] [dc=0] do one gc free sstable by queue(ret=0, free sstable cnt=0)
[2023-07-11 09:59:50.327316] INFO [STORAGE.TRANS] ob_tenant_weak_read_server_version_mgr.cpp:116 [31089][0][Y0-0000000000000000-0-0] [lt=15] [dc=0] [WRS] update tenant weak read server version(tenant_id=1, server_version={version:1689040790240384, total_part_count:1283, valid_inner_part_count:1283, valid_user_part_count:0, epoch_tstamp:1689040790322134}, version_delta=86923)

【问题现象及影响】测试环境无影响。

检查下新创建集群的observer是否可以正常访问,另外127.1是啥内容?
2023-07-11 09:02:33.383 INFO 344 — [pool-manual-subtask-executor6,339c81ca562e4839,a9d1f90fc221] .a.o.s.o.o.c.ClusterHostOperationService : server not accessible, hostId=4, exitCode=1, output=ERROR 2003 (HY000): Can’t connect to MySQL server on ‘127.1’ (111)

1、就是不知道127.1是哪里来的。确认是OBSERVER进程退出了。但是从日志看,看不出什么原因导致observer没起来。
2、通过调大了cpu_count参数,就可以正常部署完了。

127.1 等价于 127.0.0.1

2 个赞

那应该是配置不满足要求,可以对比官网,看下硬件配置需求
https://www.oceanbase.com/docs/enterprise-oceanbase-database-cn-10000000000362117

1 个赞