obd启动集群报sftp错误

【 使用环境 】生产环境 or 测试环境
【 OB or 其他组件 】
【 使用版本 】3.1.0
【问题描述】修改参数后无法启动
【复现路径】
【问题现象及影响】edit-config修改参数无法生效
背景:3个zone,每个zone一个observer和obproxy,在zone1扩容一个observer,执行了alter system add server后正常,
然后edit-config编辑了参数,修改cpu个数后,obd cluster restart xx 报错,
然后就obd cluster stop xxx ,正常,但是 obd cluster start xxxx ;报错

10.11.18.114 是新扩容的

$ obd cluster start obtest 
[ERROR] Deploy need restart.
Use `obd cluster restart obtest --wp` to make changes take effect.
If you still need to start the cluster, use the `obd cluster start obtest --wop` option to start the cluster without loading parameters. 
See https://open.oceanbase.com/docs/obd-cn/V1.4.0/10000000000436999 .
$obd cluster stop   obtest 
Get local repositories ok
Search plugins ok
Open ssh connection ok
Stop observer ok
Stop obproxy ok
obtest stopped

$ obd cluster start obtest --wop
Get local repositories ok
Search plugins ok
Open ssh connection ok
Load cluster param plugin ok
Check before start observer ok
[WARN] OBD-1007: (10.13.160.71) The recommended number of open files is 655350 (Current value: %s)
[WARN] OBD-1007: (10.13.160.166) The recommended number of open files is 655350 (Current value: %s)
[WARN] OBD-1007: (10.13.160.167) The recommended number of open files is 655350 (Current value: %s)
[WARN] OBD-1007: (10.16.18.114) The recommended number of open files is 655350 (Current value: %s)

Check before start obproxy ok
Start observer ok
observer program health check ok
Connect to observer ok
Wait for observer init ok
+-------------------------------------------------+
|                     observer                    |
+---------------+---------+------+-------+--------+
| ip            | version | port | zone  | status |
+---------------+---------+------+-------+--------+
| 10.11.160.166 | 3.1.0   | 2881 | zone2 | active |
| 10.11.160.167 | 3.1.0   | 2881 | zone3 | active |
| 10.11.160.71  | 3.1.0   | 2881 | zone1 | active |
| 10.11.18.114  | 3.1.0   | 2881 | zone1 | active |
+---------------+---------+------+-------+--------+

Start obproxy ok
obproxy program health check ok
Connect to obproxy ok
Initialize cluster
+-------------------------------------------------+
|                     obproxy                     |
+---------------+------+-----------------+--------+
| ip            | port | prometheus_port | status |
+---------------+------+-----------------+--------+
| 10.11.160.71  | 2883 | 2884            | active |
| 10.11.160.166 | 2883 | 2884            | active |
| 10.11.160.167 | 2883 | 2884            | active |
+---------------+------+-----------------+--------+
obtest running

$ obd cluster display    obtest 
Get local repositories and plugins ok
Open ssh connection ok
Cluster status check ok
Connect to observer ok
Wait for observer init ok
+-------------------------------------------------+
|                     observer                    |
+---------------+---------+------+-------+--------+
| ip            | version | port | zone  | status |
+---------------+---------+------+-------+--------+
| 10.11.160.166 | 3.1.0   | 2881 | zone2 | active |
| 10.11.160.167 | 3.1.0   | 2881 | zone3 | active |
| 10.11.160.71  | 3.1.0   | 2881 | zone1 | active |
| 10.11.18.114  | 3.1.0   | 2881 | zone1 | active |
+---------------+---------+------+-------+--------+

Connect to obproxy ok
+-------------------------------------------------+
|                     obproxy                     |
+---------------+------+-----------------+--------+
| ip            | port | prometheus_port | status |
+---------------+------+-----------------+--------+
| 10.11.160.71  | 2883 | 2884            | active |
| 10.11.160.166 | 2883 | 2884            | active |
| 10.11.160.167 | 2883 | 2884            | active |
+---------------+------+-----------------+--------+
$ obd cluster display    obtest 
Get local repositories and plugins ok
Open ssh connection ok
Cluster status check ok
Connect to observer ok
Wait for observer init ok
+-------------------------------------------------+
|                     observer                    |
+---------------+---------+------+-------+--------+
| ip            | version | port | zone  | status |
+---------------+---------+------+-------+--------+
| 10.11.160.166 | 3.1.0   | 2881 | zone2 | active |
| 10.11.160.167 | 3.1.0   | 2881 | zone3 | active |
| 10.11.160.71  | 3.1.0   | 2881 | zone1 | active |
| 10.11.18.114  | 3.1.0   | 2881 | zone1 | active |
+---------------+---------+------+-------+--------+

Connect to obproxy ok
+-------------------------------------------------+
|                     obproxy                     |
+---------------+------+-----------------+--------+
| ip            | port | prometheus_port | status |
+---------------+------+-----------------+--------+
| 10.11.160.71  | 2883 | 2884            | active |
| 10.11.160.166 | 2883 | 2884            | active |
| 10.11.160.167 | 2883 | 2884            | active |
+---------------+------+-----------------+--------+

$ obd cluster reload    obtest 
[ERROR] Deploy `obtest` need restart
Use `obd cluster restart obtest --wp` to make changes take effect.
See https://open.oceanbase.com/docs/obd-cn/V1.4.0/10000000000436999 .
$  obd cluster restart obtest --wp
Get local repositories and plugins x
[ERROR] No such restart plugin for obproxy-3.1.0
$ obd cluster reload    obtest 
[ERROR] Deploy `obtest` need restart
Use `obd cluster restart obtest --wp` to make changes take effect.
See https://open.oceanbase.com/docs/obd-cn/V1.4.0/10000000000436999 .

1、不要使用 OB 3.1.0的版本。
2、obd 1.4.0 已经不支持obproxy的组件名,支持的组件名调整为obproxy-ce ,调整可以参考:
https://www.oceanbase.com/docs/community/obd-cn/V1.4.0/10000000000437005

谢谢回复!
安装3.1.4的时候报错:

Initializes observer work home ok
Initializes obproxy work home ok
Remote oceanbase-ce-3.1.4-10000092022071511.el7-c5cd94f4f190317b6a883c58a26460a506205ce6 repository install x
[ERROR] general-py_script_install_repo-0.1 RuntimeError: EOF during negotiation
See OceanBase 社区 .

对应obd日志:

1、确认一下obd 版本 obd --version
2、麻烦重新再执行一下,如果还是报错,请把对应的obd 日志附件提供一下:
~/.obd/log/obd 文件。

$ obd cluster deploy obtest -c obtest.yaml --force
oceanbase-ce-3.1.4 already installed.
obproxy-ce-3.2.3.5 already installed.
±------------------------------------------------------------------------------------------+
| Packages |
±-------------±--------±----------------------±-----------------------------------------+
| Repository | Version | Release | Md5 |
±-------------±--------±----------------------±-----------------------------------------+
| oceanbase-ce | 3.1.4 | 10000092022071511.el7 | c5cd94f4f190317b6a883c58a26460a506205ce6 |
| obproxy-ce | 3.2.3.5 | 2.el7 | 27f0f362028a61678fcd9be5699c7681e04c1970 |
±-------------±--------±----------------------±-----------------------------------------+
Repository integrity check ok
Parameter check ok
Open ssh connection ok
Initializes observer work home ok
Initializes obproxy work home ok
Remote oceanbase-ce-3.1.4-10000092022071511.el7-c5cd94f4f190317b6a883c58a26460a506205ce6 repository install x
[ERROR] general-py_script_install_repo-0.1 RuntimeError: EOF during negotiation
See OceanBase 社区 .

$ obd --version
OceanBase Deploy: 1.5.0
REVISION: 50a1ff04e540340318b4ea0483b842665aa058d0
BUILD_BRANCH: HEAD
BUILD_TIME: Aug 18 2022 14:29:01OURCE
Copyright (C) 2021 OceanBase
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

obd.log (47.6 KB)

麻烦提供一下 /etc/ssh/sshd_config 完整的内容或者附件。

Port 22
Protocol 2
HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_dsa_key
UsePrivilegeSeparation yes
KeyRegenerationInterval 3600
ServerKeyBits 768
SyslogFacility AUTH
LogLevel INFO
LoginGraceTime 120
PermitRootLogin no
StrictModes yes
RSAAuthentication yes
PubkeyAuthentication yes
IgnoreRhosts yes
RhostsRSAAuthentication no
HostbasedAuthentication no
PermitEmptyPasswords no
ChallengeResponseAuthentication no
PasswordAuthentication no
X11Forwarding yes
X11DisplayOffset 10
PrintMotd yes
PrintLastLog yes
TCPKeepAlive yes
AcceptEnv LANG LC_*
Subsystem sftp /usr/lib/openssh/sftp-server
UsePAM yes
UseDNS no
DenyUsers xx_00

1、截图确认一下 10.11.160.166 机器上/etc/ssh/sshd_config里面
Subsystem sftp /usr/lib/openssh/sftp-server 指定的sftp是否存在:
/usr/lib/openssh/sftp-server 或者 /usr/libexec/openssh/sftp-server
2、然后重启一下sshd服务,重新执行obd部署。

可以了,非常感谢大佬!

大佬,还想弱弱问一句,怎么定位到是这个原因呢?

最终是做了什么调整呢,麻烦帮忙可以在问答区也回复一下,便于后续其他同学遇到可以参考学习一下。

# cat /etc/ssh/sshd_config | grep openssh
Subsystem sftp /usr/lib/openssh/sftp-server
# ls -ltr /usr/lib/openssh/sftp-server
ls: cannot access /usr/lib/openssh/sftp-server: No such file or directory

# ll  /usr/libexec/openssh/sftp-server
-rwxr-xr-x 1 root root 84016 Nov 12  2016 /usr/libexec/openssh/sftp-server

然后把/etc/ssh/sshd_config中的 /usr/lib/openssh/sftp-server替换成/usr/libexec/openssh/sftp-server

然后重启 sshd 就可以

1 个赞