obd启动cluster失败,报RuntimeError

【产品名称】OceanBase社区版

【产品版本】3.1

【问题描述】

请问一下我用obd启动cluster,失败报[ERROR] oceanbase-ce-py_script_start-3.1.0 RuntimeError: substring not found,这个该何解?

[root@4550aaa5e393 admin]# obd cluster start amber

Get local repositories and plugins ok

Open ssh connection ok

Cluster param config check ok

Check before start observer ok

[WARN] (192.168.0.12) clog and data use the same disk (/home/admin)

[WARN] (192.168.0.13) clog and data use the same disk (/home/admin)

[WARN] (192.168.0.14) clog and data use the same disk (/home/admin)

Check before start obproxy ok

Start observer x

[ERROR] oceanbase-ce-py_script_start-3.1.0 RuntimeError: substring not found

[ERROR] oceanbase-ce start failed

请提供下OBD的版本号obd --version和obd执行日志 ~/.obd/log/obd

obd报错日志如下

[2021-10-09 14:58:32] [4cfbd8c2-28ce-11ec-87b6-0202c0a8000c] [ERROR] [ERROR] oceanbase-ce-pyscriptstart-3.1.0 RuntimeError: substring not found

[2021-10-09 14:58:32] [4cfbd8c2-28ce-11ec-87b6-0202c0a8000c] [ERROR] Traceback (most recent call last):

 File "core.py", line 993, in start_cluster

 File "_plugin.py", line 234, in call

 File "plugin.py", line 208, in new_func

 File "/root/.obd/plugins/oceanbase-ce/3.1.0/start.py", line 164, in start

  ret = client.executecommand(clusterscmd[server])

 File "plugin.py", line 165, in newmethod

 File "ssh.py", line 221, in execute_command

ValueError: substring not found


[2021-10-09 14:58:32] [4cfbd8c2-28ce-11ec-87b6-0202c0a8000c] [DEBUG] - sub start ref count to 0

[2021-10-09 14:58:32] [4cfbd8c2-28ce-11ec-87b6-0202c0a8000c] [DEBUG] - export start

[2021-10-09 14:58:32] [4cfbd8c2-28ce-11ec-87b6-0202c0a8000c] [ERROR] [ERROR] oceanbase-ce start failed

[2021-10-09 14:58:32] [4cfbd8c2-28ce-11ec-87b6-0202c0a8000c] [INFO] [ERROR] oceanbase-ce-pyscriptstart-3.1.0 RuntimeError: substring not found

[ERROR] oceanbase-ce start failed

obd版本如下:

OceanBase Deploy: 1.1.1

REVISION: eca0509213255ef714d350a5af897b381f9244ea

BUILD_BRANCH: master

BUILD_TIME: Sep 30 2021 08:54:49OURCE

Copyright (C) 2021 OceanBase

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.


看上去是执行启动命令的时候异常返回了。看下home_path/bin下是否有observer

我看到由软连接,observer了

[root@4550aaa5e393 bin]# ls -l

total 4

lrwxrwxrwx 1 root root 94 Oct 9 14:21 observer -> /root/.obd/repository/oceanbase-ce/3.1.0/56f57e9843e719d830ec03c206d914f4b3adc82b/bin/observer

麻烦在提供下


[2021-10-09 14:58:32] [4cfbd8c2-28ce-11ec-87b6-0202c0a8000c] [ERROR] [ERROR] oceanbase-ce-pyscriptstart-3.1.0 RuntimeError: substring not found


这行上面5行的内容

内容如下

[2021-10-09 16:41:02] [9e4d231c-28dc-11ec-84f3-0202c0a8000c] [DEBUG] -- starting 192.168.0.12 observer

[2021-10-09 16:41:02] [9e4d231c-28dc-11ec-84f3-0202c0a8000c] [DEBUG] -- admin@192.168.0.12 set env LDLIBRARYPATH to '/home/admin/oceanbase-ce-2/lib:'

[2021-10-09 16:41:02] [9e4d231c-28dc-11ec-84f3-0202c0a8000c] [DEBUG] -- admin@192.168.0.12 execute: cd /home/admin/oceanbase-ce-2; /home/admin/oceanbase-ce-2/bin/observer -r '192.168.0.12:2882:2881' -o __min_full_resource_pool_memory=268435456,memory_limit='16G',system_memory='8G',stack_size='512K',cpu_count=16,cache_wash_threshold='1G',workers_per_cpu_quota=10,schema_history_expire_time='1d',net_thread_count=4,major_freeze_duty_time='Disable',minor_freeze_times=10,enable_separate_sys_clog=0,enable_merge_by_turn=False,datafile_disk_percentage=20,enable_syslog_recycle=True,max_syslog_file_count=4,root_passwd='SLzhao831!',z1=ordereddict([('mysql_port', 2881), ('rpc_port', 2882), ('home_path', '/home/admin/oceanbase-zone1-1'), ('data_dir', '/data/1'), ('redo_dir', '/data/log1'), ('zone', 'zone1')]),z2=ordereddict([('mysql_port', 2881), ('rpc_port', 2882), ('home_path', '/home/admin/oceanbase-zone2-1'), ('data_dir', '/data/1'), ('redo_dir', '/data/log1'), ('zone', 'zone2')]),z3=ordereddict([('mysql_port', 2881), ('rpc_port', 2882), ('home_path', '/home/oceanbase-zone3-1'), ('data_dir', '/data/1'), ('redo_dir', '/data/log1'), ('zone', 'zone3')]) -z 'zone1' -p 2881 -P 2882 -n 'obcluster' -c 1 -d '/home/admin/oceanbase-ce-2/store' -i 'eth0' -l 'INFO' 

看上去是yaml缩进有问题。能否提供下yaml文件呢?请使用obd cluster edit-config查看

请查收,文件传输有点慢,只能附图

看上去z1 z2 z3是在global的下级。但他们应该与global平级

请调整下z1 z2 z3的缩进,他们应该与global平级

麻烦老师,的确是z1、z2跟z3存在错行问题

目前启动之后存在两个问题

1、三个主机都启动在zone1上面了,导致没做成三个zone副本

+------------------------------------------------+

|          observer          |

+--------------+---------+------+-------+--------+

| ip      | version | port | zone | status |

+--------------+---------+------+-------+--------+

| 192.168.0.12 | 3.1.0  | 2881 | zone1 | active |

| 192.168.0.13 |     | 0  | zone1 | active |

| 192.168.0.14 |     | 0  | zone1 | active |

+--------------+---------+------+-------+--------+

2、obproxy无法正常启动  192.168.0.15

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] - sub display ref count to 0

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] - export display

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] - Call obproxy-pyscriptstart-3.1.0 for obproxy-3.1.0-0b17cf0459a3b53c5a2febb6572894d183154c64

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] - import start

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] - add start ref count to 1

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [INFO] Start obproxy

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- admin@192.168.0.15 execute: bash -c 'if [ -f /home/admin/oceanbase-obproxy-3/bin/obproxy ]; then exit 1; else exit 0; fi;' 

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- exited code 1, error output:


[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- 192.168.0.15 port check

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- admin@192.168.0.15 execute: cat /home/admin/oceanbase-obproxy-3/run/obproxy-192.168.0.15-2883.pid 

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- exited code 1, error output:

cat: /home/admin/oceanbase-obproxy-3/run/obproxy-192.168.0.15-2883.pid: No such file or directory


[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- starting 192.168.0.15 obproxy

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- admin@192.168.0.15 set env LDLIBRARYPATH to '/home/admin/oceanbase-obproxy-3/lib:'

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- admin@192.168.0.15 execute: cd /home/admin/oceanbase-obproxy-3; /home/admin/oceanbase-obproxy-3/bin/obproxy -o enablestrictkernelrelease=False,enableclustercheckout=False --listenport 2883 --prometheuslistenport 2884 --rslist '192.168.0.12:2881; 192.168.0.13:2881; 192.168.0.14:2881' --clustername 'obcluster' 

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- exited code 0

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- admin@192.168.0.15 set env LDLIBRARYPATH to ''

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- admin@192.168.0.15 execute: ps -aux | grep -e '/home/admin/oceanbase-obproxy-3/bin/obproxy -o enablestrictkernelrelease=False,enableclustercheckout=False --listenport 2883 --prometheuslistenport 2884 --rslist 192.168.0.12:2881; 192.168.0.13:2881; 192.168.0.14:2881 --clustername obcluster$' | grep -v grep | awk '{print $2}' > /home/admin/oceanbase-obproxy-3/run/obproxy-192.168.0.15-2883.pid 

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- exited code 0

[2021-10-09 17:17:39] [a954b928-28e1-11ec-807b-0202c0a8000c] [INFO] obproxy program health check

[2021-10-09 17:17:42] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- 192.168.0.15 program health check

[2021-10-09 17:17:42] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- admin@192.168.0.15 execute: cat /home/admin/oceanbase-obproxy-3/run/obproxy-192.168.0.15-2883.pid 

[2021-10-09 17:17:42] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] -- exited code 0

[2021-10-09 17:17:43] [a954b928-28e1-11ec-807b-0202c0a8000c] [WARNING] [WARN] failed to start 192.168.0.15 obproxy

[2021-10-09 17:17:43] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] - sub start ref count to 0

[2021-10-09 17:17:43] [a954b928-28e1-11ec-807b-0202c0a8000c] [DEBUG] - export start

[2021-10-09 17:17:43] [a954b928-28e1-11ec-807b-0202c0a8000c] [ERROR] [ERROR] obproxy start failed


这是在之前启动失败的基础上重新启动的吧?一些错误的配置项应该已经被持久化了。如果方便的话,可以尝试obd cluster redeploy解决。无法执行redeploy的话,就需要更多的日志

老师,之前zone问题已经解决,但obproxy依旧启动不了

麻烦帮忙看看obproxy配置是否存在问题

obproxy:

  servers:

  • 192.168.0.15

  global:

    listen_port: 2883 # External port.The default value is 2883

    prometheuslistenport: 2884 # The Prometheus port. The default value is 2884.

    home_path: /home/admin/oceanbase-obproxy # The default value is /root/obproxy

    # oceanbase root server list

    # format: ip:mysqlport;ip:mysqlport

    rs_list: 192.168.0.12:2881;192.168.0.13:2881;192.168.0.14:2881

    enableclustercheckout: false

    # observer cluster name, consistent with oceanbase-ce's appname

    cluster_name: obcluster

    obproxysyspassword: ****# obproxy sys user password, can be empty

    # observersyspassword: # proxyro user password, consistent with oceanbase-ce's proxyro_password, can be empty