【 使用环境 】生产环境
【 OB or 其他组件 】OceanBase、OBProxy、oceanbase-diagnostic-tool
【 使用版本 】OceanBase-4.2.1-CE、OBProxy-4.2.1/4.3.1、obdiag-2.3.0
【问题描述】使用oceanbase-diagnostic-tool(obdiag)对集群进行问题诊断,按照obdiag产品文档配置好集群相关信息后,在obdiag安装机器上执行obdig rca run --scene=disconnection -v
出现远程节点认证报错:
[root@Z22301 .obdiag]# obdiag rca run --scene=disconnection -v
- cmd: []
- opts: {'scene': 'disconnection', 'store_dir': './rca/', 'input_parameters': None, 'c': '/root/.obdiag/config.yml'}
- mkdir /usr/local/oceanbase-diagnostic-tool/conf/inner_config.yml
- mkdir /root/.obdiag/config.yml
[ERROR] rca run Exception: init ssh client error: Bad authentication type; allowed types: ['publickey', 'keyboard-interactive']
- Traceback (most recent call last):
File "paramiko/transport.py", line 1622, in auth_password
File "paramiko/transport.py", line 1727, in auth_interactive
File "paramiko/auth_handler.py", line 263, in wait_for_response
paramiko.ssh_exception.AuthenticationException: Authentication failed.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "common/ssh_client/ssh.py", line 65, in init
File "stdio.py", line 891, in wrapper
File "common/ssh_client/remote_client.py", line 74, in __init__
File "paramiko/client.py", line 485, in connect
File "paramiko/client.py", line 818, in _auth
File "paramiko/client.py", line 805, in _auth
File "paramiko/transport.py", line 1625, in auth_password
File "paramiko/transport.py", line 1603, in auth_password
File "paramiko/auth_handler.py", line 263, in wait_for_response
paramiko.ssh_exception.BadAuthenticationType: Bad authentication type; allowed types: ['publickey', 'keyboard-interactive']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "core.py", line 356, in rca_run
File "handler/rca/rca_handler.py", line 48, in __init__
File "stdio.py", line 891, in wrapper
File "common/ssh_client/ssh.py", line 37, in __init__
File "common/ssh_client/ssh.py", line 73, in init
Exception: init ssh client error: Bad authentication type; allowed types: ['publickey', 'keyboard-interactive']
Trace ID: 6d5e8a78-6030-11ef-b02d-04e8928244f4
If you want to view detailed obdiag logs, please run: obdiag display-trace 6d5e8a78-6030-11ef-b02d-04e8928244f4
obdiag配置文件如下:
obcluster:
ob_cluster_name: obcluster
db_host: xx.xxx.xx.xx
db_port: '2883'
tenant_sys:
user: root@sys#xxxx
password: xxxxx
servers:
nodes:
- ip: 19.xxx.xx.11
- ip: 19.xxx.xx.12
- ip: 19.xxx.xx.13
##节点个数过多省略展示
global:
ssh_username: root
ssh_password: 'xxxxxxx'
ssh_port: '22'
#ssh_key_file: ''
home_path: /home/admin/oceanbase
data_dir: /data/1
redo_dir: /data/log1
obproxy:
obproxy_cluster_name: obproxy
servers:
nodes:
- ip: 19.xxx.xx.31
global:
ssh_username: root
ssh_password: 'xxxxx'
ssh_type: remote
ssh_port: 22
#ssh_key_file: ''
home_path: /home/admin/obproxy
【排查过程】在社区“渠磊”和“靖顺”两位老师的支持下,进行了排查和调试,
- 在安装obdiag的机器上手动执行ssh 登陆,确保配置文件中的 ip, port, username, password正确性和机器之间的连通性(确认没有问题, obdiag安装在OCP节点)
- 经过第一步检查,基本可以确定是因为目标机器的操作系统设置:“服务端只支持公钥或者键盘输入(先ssh的指令回车后,再手动进行密码输入),这边obdiag用的是单次指令输入”
- 查看目标节点的 sshd_config文件,并且尝试修改部分参数:
- PasswordAuthentication 由no 改为 yes
- UsePAM 由yes 改为 no
每次修改完均重启sshd服务,重试报错无变化
- 修改obdiag的配置文件config.yml 中obproxy部分的配置:
obproxy:
obproxy_cluster_name: obproxy
servers:
nodes:
- ip: 19.xxx.xx.31
global:
ssh_username: root
ssh_password: 'xxxxx'
ssh_type: remote
ssh_port: 22
## 去掉ssh_key_file注释
ssh_key_file: ''
home_path: /home/admin/obproxy
- 通过 sshpass -p ‘密码’ ssh root@目标机器ip 发现可以正常登录
【临时解决办法】
在obdiag的配置文件里面把obcluster部分的node改为空list
obcluster:
ob_cluster_name: obcluster
db_host: xx.xxx.xx.xx
db_port: '2883'
tenant_sys:
user: root@sys#xxxx
password: xxxxx
servers:
nodes:
#- ip: 19.xxx.xx.11
#- ip: 19.xxx.xx.12
#- ip: 19.xxx.xx.13
##节点个数过多省略展示
global:
ssh_username: root
ssh_password: 'xxxxxxx'
ssh_port: '22'
#ssh_key_file: ''
home_path: /home/admin/oceanbase
data_dir: /data/1
redo_dir: /data/log1
问题临时解决,但具体原因还在排查中。在此感谢“渠磊”和“靖顺”两位老师的耐心支持,希望官方大大能看到这条发自内心的问题记录+感谢贴。后续一定会和两位老师继续配合排查具体原因并尝试修复,到时候再来本帖下追评