【 使用环境 】测试环境
【 OB or 其他组件 】OCP
【 使用版本 】OCP 3.3.0 bp2
【问题描述】OCP搭建后集群中未接管obcluster集群,接管集群子任务check observer process user报错,从ocp集群上查看observer相关进程存在
【复现路径】重试问题依旧,删除重新搭建也是同样问题
【问题现象及影响】OCP无法查看obcluster集群
日志信息
2023-02-07 17:02:20.177 INFO 68 --- [pool-manual-subtask-executor2,e700d3a68bac4717,7e6f8a943e89] c.a.o.c.m.t.model.SubtaskInstanceEntity : Run subtask, id=76, context=Context{parallelIdx=0, stringMap={cluster_version=3.1.4, cluster_name=obcluster, target_server_status=RUNNING, ssh_port=22, prohibit_rollback=false, service_name=obcluster:1, target_zone_status=RUNNING, task_instance_id=45, ob_connect_address=172.16.234.18:2881, task_operation=execute, cluster_type=PRIMARY, service_version=3.1.4, cluster_id=1, root_sys_password=******, service_type=OB_CLUSTER, ob_data_dir=/data/1, connection_mode=direct, target_cluster_status=RUNNING, latest_execution_start_time=2023-02-07T17:02:20.150+08:00, sub_task_instance_id=76, credential_id=1}, listMap={add_region_ids=[2], server_ids=[1], add_idc_ids=[1], all_host_ids=[1], add_host_ids=[1], host_ids=[1], zone_names=[zone1]}}, executor=172.16.234.18
2023-02-07 17:02:20.260 INFO 68 --- [pool-manual-subtask-executor2,e700d3a68bac4717,7e6f8a943e89] c.o.o.e.internal.template.HttpTemplate : POST request to agent, url:http://172.16.234.18:62888/api/v1/process/info, request body:GetProcessInfoRequest(processName=observer), params:null
2023-02-07 17:02:20.424 ERROR 68 --- [pool-manual-subtask-executor2,e700d3a68bac4717,7e6f8a943e89] com.alipay.ocp.core.util.ExceptionUtils : Checked Exception: com.alipay.ocp.core.exception.UnexpectedException occurred with code error.ob.cluster.takeover.wrong.user, and args [root]
2023-02-07 17:02:20.431 INFO 68 --- [pool-manual-subtask-executor2,e700d3a68bac4717,7e6f8a943e89] c.a.o.c.m.t.model.SubtaskInstanceEntity : Set state for subtask: 76, current state: RUNNING, new state: FAILED
2023-02-07 17:02:20.437 WARN 68 --- [pool-manual-subtask-executor2,e700d3a68bac4717,7e6f8a943e89] c.a.o.c.t.engine.runner.RunnerFactory : Execute task failed, subtask=SubtaskInstanceEntity{id=76, name=Check observer process user, state=FAILED, operation=EXECUTE, className=com.alipay.ocp.service.task.business.host.CheckObserverProcessUserTask, seriesId=13, startTime=2023-02-07T17:02:20.150+08:00, endTime=2023-02-07T17:02:20.436+08:00}, failedMessage=The user of observer process must be admin, current is root
com.alipay.ocp.core.exception.UnexpectedException: [OCP UnexpectedException]: status=500 INTERNAL_SERVER_ERROR, errorCode=OB_CLUSTER_TAKEOVER_OBSERVER_WRONG_USER, args=root
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_312]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_312]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_312]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_312]
at com.alipay.ocp.core.util.ExceptionUtils.newException(ExceptionUtils.java:96) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.util.ExceptionUtils.throwException(ExceptionUtils.java:90) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.util.ExceptionUtils.unExpected(ExceptionUtils.java:77) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.service.task.business.host.CheckObserverProcessUserTask.run(CheckObserverProcessUserTask.java:56) ~[ocp-service-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.metadb.task.model.SubtaskInstanceEntity.run(SubtaskInstanceEntity.java:221) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.JavaTaskRunner.doExecute(JavaTaskRunner.java:26) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.JavaTaskRunner.run(JavaTaskRunner.java:20) ~[ocp-core-3.3.0-202
20427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:113) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.RunnerFactory.redirectOutputIfNotSysSchedule(RunnerFactory.java:185) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.runner.RunnerFactory.run(RunnerFactory.java:102) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at com.alipay.ocp.core.task.engine.coordinator.worker.subtask.ReadySubtaskWorker.lambda$submitTask$3(ReadySubtaskWorker.java:123) ~[ocp-core-3.3.0-20220427.jar!/:3.3.0-20220427]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_312]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_312]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_312]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_312]
查看进程
[root@sc413-ocp01 ocp-3.3.0-ce-bp2-x86_64]# ps -ef | grep -E "observer|obproxy*|ocp"
root 24316 1 0 16:39 ? 00:00:01 bash /home/admin/obproxy/obproxyd.sh /home/admin/obproxy 172.16.234.18 2883 daemon
root 24331 1 13 16:39 ? 00:02:48 /home/admin/obproxy/bin/obproxy --listen_port 2883
root 24448 1 99 16:39 ? 02:21:48 /home/admin/oceanbase/bin/observer -r 172.16.234.18:2882:2881 -o __min_full_resource_pool_memory=268435456,enable_syslog_recycle=True,enable_syslog_wf=True,max_syslog_file_count=4,memory_limit=52G,system_memory=26G,cpu_count=24,datafile_size=224G -z zone1 -p 2881 -P 2882 -n obcluster -c 1 -d /data/1 -l INFO
admin 27762 27730 0 16:51 ? 00:00:00 bash /home/admin/ocp-server/bin/ocp-server
admin 27763 27730 0 16:51 ? 00:00:00 bash /home/admin/bin/ocp_obproxyd.sh
admin 27823 27762 82 16:51 ? 00:07:46 /usr/lib/jvm/java-1.8.0/bin/java -server -XX:+UseG1GC -Xms45875m -Xmx45875m -Xss512k -XX:+PrintCommandLineFlags -XX:MetaspaceSize=1024m -XX:MaxMetaspaceSize=1024m -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -Xloggc:/home/admin/ocp-server/bin/../log/gc.log -XX:+UseGCLogFileRotation -XX:GCLogFileSize=50M -XX:NumberOfGCLogFiles=2 -XX:ErrorFile=/home/admin/ocp-server/bin/../log/hs_err_pid%p.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/admin/ocp-server/bin/../log/ -Dfile.encoding=UTF-8 -jar /home/admin/ocp-server/bin/../lib/ocp-server-3.3.0-20220427.jar
admin 28932 27730 15 16:52 ? 00:01:13 ./bin/obproxy -p2888 -n ocp_obproxy -o obproxy_config_server_url=http://127.0.0.1:8080/services?Action=GetObProxyConfig&User_ID=alibaba&UID=admin,syslog_level=INFO,skip_proxyro_check=true,skip_proxy_sys_private_check=true,enable_strict_kernel_release=false,enable_metadb_used=false,enable_proxy_scramble=true,proxy_mem_limited=1G,log_dir_size_threshold=10G
root 35065 12445 0 17:00 pts/0 00:00:00 grep --color=auto -E observer|obproxy*|ocp
查看监听端口
[root@sc413-ocp01 ocp-3.3.0-ce-bp2-x86_64]# netstat -tunlp | grep -E "2881|2882|2883"
tcp 0 0 0.0.0.0:2881 0.0.0.0:* LISTEN 24448/observer
tcp 0 0 0.0.0.0:2882 0.0.0.0:* LISTEN 24448/observer
tcp 0 0 0.0.0.0:2883 0.0.0.0:* LISTEN 24331/obproxy
【附件】
ocp.log.tar.gz (219.1 KB)