obdiag自助诊断工具报节点登陆失败错误“paramiko.ssh_exception.AuthenticationException: Authentication failed."

【 使用环境 】生产环境
【 OB or 其他组件 】OceanBase、OBProxy、oceanbase-diagnostic-tool
【 使用版本 】OceanBase-4.2.1-CE、OBProxy-4.2.1/4.3.1、obdiag-2.3.0
【问题描述】使用oceanbase-diagnostic-tool(obdiag)对集群进行问题诊断,按照obdiag产品文档配置好集群相关信息后,在obdiag安装机器上执行obdig rca run --scene=disconnection -v
出现远程节点认证报错:

[root@Z22301 .obdiag]# obdiag rca run --scene=disconnection -v
- cmd: []
- opts: {'scene': 'disconnection', 'store_dir': './rca/', 'input_parameters': None, 'c': '/root/.obdiag/config.yml'}
- mkdir /usr/local/oceanbase-diagnostic-tool/conf/inner_config.yml
- mkdir /root/.obdiag/config.yml
[ERROR] rca run Exception: init ssh client error: Bad authentication type; allowed types: ['publickey', 'keyboard-interactive']
- Traceback (most recent call last):
  File "paramiko/transport.py", line 1622, in auth_password
  File "paramiko/transport.py", line 1727, in auth_interactive
  File "paramiko/auth_handler.py", line 263, in wait_for_response
paramiko.ssh_exception.AuthenticationException: Authentication failed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "common/ssh_client/ssh.py", line 65, in init
  File "stdio.py", line 891, in wrapper
  File "common/ssh_client/remote_client.py", line 74, in __init__
  File "paramiko/client.py", line 485, in connect
  File "paramiko/client.py", line 818, in _auth
  File "paramiko/client.py", line 805, in _auth
  File "paramiko/transport.py", line 1625, in auth_password
  File "paramiko/transport.py", line 1603, in auth_password
  File "paramiko/auth_handler.py", line 263, in wait_for_response
paramiko.ssh_exception.BadAuthenticationType: Bad authentication type; allowed types: ['publickey', 'keyboard-interactive']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "core.py", line 356, in rca_run
  File "handler/rca/rca_handler.py", line 48, in __init__
  File "stdio.py", line 891, in wrapper
  File "common/ssh_client/ssh.py", line 37, in __init__
  File "common/ssh_client/ssh.py", line 73, in init
Exception: init ssh client error: Bad authentication type; allowed types: ['publickey', 'keyboard-interactive']

Trace ID: 6d5e8a78-6030-11ef-b02d-04e8928244f4
If you want to view detailed obdiag logs, please run: obdiag display-trace 6d5e8a78-6030-11ef-b02d-04e8928244f4

obdiag配置文件如下:

obcluster:
  ob_cluster_name: obcluster
  db_host: xx.xxx.xx.xx
  db_port: '2883'
  tenant_sys:
    user: root@sys#xxxx
    password: xxxxx
  servers:
    nodes:
    - ip: 19.xxx.xx.11
    - ip: 19.xxx.xx.12
    - ip: 19.xxx.xx.13
    ##节点个数过多省略展示
    global:
      ssh_username: root
      ssh_password: 'xxxxxxx'
      ssh_port: '22'
      #ssh_key_file: ''
      home_path: /home/admin/oceanbase
      data_dir: /data/1
      redo_dir: /data/log1
obproxy:
  obproxy_cluster_name: obproxy
  servers:
    nodes:
    - ip: 19.xxx.xx.31
    global:
      ssh_username: root
      ssh_password: 'xxxxx'
      ssh_type: remote
      ssh_port: 22
      #ssh_key_file: ''
      home_path: /home/admin/obproxy

【排查过程】在社区“渠磊”和“靖顺”两位老师的支持下,进行了排查和调试,

  1. 在安装obdiag的机器上手动执行ssh 登陆,确保配置文件中的 ip, port, username, password正确性和机器之间的连通性(确认没有问题, obdiag安装在OCP节点)
  2. 经过第一步检查,基本可以确定是因为目标机器的操作系统设置:“服务端只支持公钥或者键盘输入(先ssh的指令回车后,再手动进行密码输入),这边obdiag用的是单次指令输入”
  3. 查看目标节点的 sshd_config文件,并且尝试修改部分参数:
    • PasswordAuthentication 由no 改为 yes
    • UsePAM 由yes 改为 no
      每次修改完均重启sshd服务,重试报错无变化
  4. 修改obdiag的配置文件config.yml 中obproxy部分的配置:
obproxy:
  obproxy_cluster_name: obproxy
  servers:
    nodes:
    - ip: 19.xxx.xx.31
    global:
      ssh_username: root
      ssh_password: 'xxxxx'
      ssh_type: remote
      ssh_port: 22
## 去掉ssh_key_file注释
      ssh_key_file: ''
      home_path: /home/admin/obproxy
  1. 通过 sshpass -p ‘密码’ ssh root@目标机器ip 发现可以正常登录

【临时解决办法】

在obdiag的配置文件里面把obcluster部分的node改为空list

obcluster:
  ob_cluster_name: obcluster
  db_host: xx.xxx.xx.xx
  db_port: '2883'
  tenant_sys:
    user: root@sys#xxxx
    password: xxxxx
  servers:
    nodes:
    #- ip: 19.xxx.xx.11
    #- ip: 19.xxx.xx.12
    #- ip: 19.xxx.xx.13
    ##节点个数过多省略展示
    global:
      ssh_username: root
      ssh_password: 'xxxxxxx'
      ssh_port: '22'
      #ssh_key_file: ''
      home_path: /home/admin/oceanbase
      data_dir: /data/1
      redo_dir: /data/log1

问题临时解决,但具体原因还在排查中。在此感谢“渠磊”和“靖顺”两位老师的耐心支持,希望官方大大能看到这条发自内心的问题记录+感谢贴。后续一定会和两位老师继续配合排查具体原因并尝试修复,到时候再来本帖下追评

4 个赞

可以写个博文记录下,这样看上去也方便点 :face_with_peeking_eye:

感谢您的分享,相关问题可以及时反馈到社区

在渠磊老师的指导下,最终通过监听目标机器的安全日志tail -f /var/log/secure

在安装了obdiag的节点执行obdiag gather log 发起建链操作。

在目标主机的安全日志中检查到了关键内容:

pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root".

可见因为生产环境安全策略的设计:PAM 相关模块策略配置,禁止了 UID 小于 1000 的用户进行登录。

PAM相关的配置文件:

/etc/pam.d/login 控制台(vnc)对应配置文件
/etc/pam.d/sshd 登录对应配置文件
/etc/pam.d/system-auth 系统全局配置文件

检查前述配置文件中,是否有类似如下配置信息:

auth        required      pam_succeed_if.so uid >= 1000

注释掉相关配置后再重试obdiag操作,返回正常。

实际环境不推荐修改,可以新建普通用户专门给obdiag进行巡检和异常排查使用(推荐使用admin用户),obdiag仅查询不会对集群的目录、文件进行修改,理论上不影响集群的正常运行