使用obd 2.9.2 命令行部署 oceanbase 4.3.2.1 集群失败

【 使用环境 】生产环境

【 使用版本 】
obd 2.9.2
【问题描述】
使用obd 2.9.2 命令行部署 oceanbase 4.3.2.1 集群失败, 使用同样的配置文件,在测试环境都可以成功,生产环境失败
报错: [ERROR] Failed to install lib package for local

部署文件

user:
   username: obadmin
   password: 12345678!
   port: 22
oceanbase-ce:
  depends:
    - ob-configserver
  version: 4.3.2.1
  servers:
    - name: server1
      ip: 10.1.250.161
    - name: server2
      ip: 10.1.250.160
    - name: server3
      ip: 10.1.250.157
  global:
    memory_limit: 24G # The maximum running memory for an observer
    system_memory: 6G
    datafile_size: 50G # Size of the data file. 
    log_disk_size: 50G # The size of disk space used by the clog files.
    enable_syslog_wf: false # Print system logs whose levels are higher than WARNING to a separate log file. The default value is true.
    enable_syslog_recycle: true # Enable auto system log recycling or not. The default value is false.
    max_syslog_file_count: 4 # The maximum number of reserved log files before enabling auto recycling. The default value is 0.
    # Cluster name for OceanBase Database. The default value is obcluster. When you deploy OceanBase Database and obproxy, this value must be the same as the cluster_name for obproxy.
    appname: stagoceanbase
    root_password: DevOps00!@#
    ocp_meta_db: ocp_express # The database name of ocp express meta
    ocp_meta_username: meta # The username of ocp express meta
    ocp_meta_password: 'DevOps00!@#' # The password of ocp express meta
    ocp_agent_monitor_password: 'DevOps00!@#' # The password for obagent monitor user
    ocp_meta_tenant: # The config for ocp express meta tenant
      tenant_name: ocp
      max_cpu: 1
      memory_size: 2G
      log_disk_size: 7680M # The recommend value is (4608 + (expect node num + expect tenant num) * 512) M.
  server1:
    mysql_port: 2881 # External port for OceanBase Database. The default value is 2881. DO NOT change this value after the cluster is started.
    rpc_port: 2882 # Internal port for OceanBase Database. The default value is 2882. DO NOT change this value after the cluster is started.
    obshell_port: 2886 # Operation and maintenance port for Oceanbase Database. The default value is 2886. This parameter is valid only when the version of oceanbase-ce is 4.2.2.0 or later.
    # The working directory for OceanBase Database. OceanBase Database is started under this directory. This is a required field.
    home_path: /data1/stag_oceanbase/observer
    zone: zone1
    local_ip: 10.1.250.161
  server2:
    mysql_port: 2881 # External port for OceanBase Database. The default value is 2881. DO NOT change this value after the cluster is started.
    rpc_port: 2882 # Internal port for OceanBase Database. The default value is 2882. DO NOT change this value after the cluster is started.
    obshell_port: 2886 # Operation and maintenance port for Oceanbase Database. The default value is 2886. This parameter is valid only when the version of oceanbase-ce is 4.2.2.0 or later.
    #  The working directory for OceanBase Database. OceanBase Database is started under this directory. This is a required field.
    home_path: /data1/stag_oceanbase/observer
    zone: zone2
    local_ip: 10.1.250.160
  server3:
    mysql_port: 2881 # External port for OceanBase Database. The default value is 2881. DO NOT change this value after the cluster is started.
    rpc_port: 2882 # Internal port for OceanBase Database. The default value is 2882. DO NOT change this value after the cluster is started.
    obshell_port: 2886 # Operation and maintenance port for Oceanbase Database. The default value is 2886. This parameter is valid only when the version of oceanbase-ce is 4.2.2.0 or later.
    #  The working directory for OceanBase Database. OceanBase Database is started under this directory. This is a required field.
    home_path: /data1/stag_oceanbase/observer
    zone: zone3
    local_ip: 10.1.250.157
obproxy-ce:
  # Set dependent components for the component.
  # When the associated configurations are not done, OBD will automatically get the these configurations from the dependent components.
  depends:
    - oceanbase-ce
    - ob-configserver
  version: 4.2.3.0
  servers:
    - 10.1.250.161
  global:
    listen_port: 2883 # External port. The default value is 2883.
    prometheus_listen_port: 2884 # The Prometheus port. The default value is 2884.
    home_path: /data1/stag_oceanbase/obproxy
    enable_cluster_checkout: false
    # observer cluster name, consistent with oceanbase-ce's appname. When a depends exists, OBD gets this value from the oceanbase-ce of the depends.
    cluster_name: devoceanbase
    skip_proxy_sys_private_check: true
    enable_strict_kernel_release: false
obagent:
  depends:
    - oceanbase-ce
  servers:
    - name: server1
      # Please don't use hostname, only IP can be supported
      ip: 10.1.250.161
    - name: server2
      ip: 10.1.250.160
    - name: server3
      ip: 10.1.250.157
  global:
    home_path: /data1/stag_oceanbase/obagent
prometheus:
  servers:
    - 10.1.250.161
  depends:
    - obagent
  global:
    # The working directory for prometheus. prometheus is started under this directory. This is a required field.
    home_path: /data1/stag_oceanbase/prometheus
    address: 0.0.0.0  # The ip address to bind to. Along with port, corresponds to the `web.listen-address` parameter.
    port: 9091 # The http port to use. Along with address, corresponds to the `web.listen-address` parameter.
grafana:
  servers:
    - 10.1.250.161
  depends:
    - prometheus
  global:
    home_path: /data1/stag_oceanbase/grafana
    address: 0.0.0.0
    login_password: ch999 # Grafana login password.
    port: 3003 # The http port to use, can be empty. The default value is 3000.

ocp-express:
  depends:
    - oceanbase-ce
    - obproxy-ce
    - obagent
  servers:
    - 10.1.250.161
  global:
    # The working directory for prometheus. prometheus is started under this directory. This is a required field.
    home_path: /data1/stag_oceanbase/ocp-express
    # log_dir: /home/oceanbase/ocp-express/log # The log directory of ocp express server. The default value is {home_path}/log.
    memory_size: 6G # The memory size of ocp-express server.  The recommend value is 512MB * (expect node num + expect tenant num) * 60MB.
ob-configserver:
  servers:
    - 10.1.250.161
  global:
    listen_port: 8080 # The port of ob-configserver web
    # server_ip: 0.0.0.0 # Listen to the ob-configserver server IP。When you want to listen to the specified IP address,use it.
    home_path: /data1/stag_oceanbase/ob-configserver  # The working directory for prometheus. ob-configserver is started under this directory. This is a required field.
    ## log config
    # log_level: info # Log printing level of ob-configserver。The default value is `info`
    # log_maxsize: 30 # The total size of manager ob-configserver.Log size is measured in Megabytes.The default value is 30
    # log_maxage: 7 # The days of manager expired ob-configserver.Log retention days. The default value is 7
    # log_maxbackups: 10  #The number of manager expired ob-configserver.Log. The default value is 10
    # log_localtime: true #  Switch of ob-configserver.Log naming with localtime. The default value is true
    # log_compress: true # Compress ob-configserver.Log switch. The default value is true

    ## vip config, configserver will generate url with vip_address and port and return it to the client
    ## do not use some random value that can't be connected
    # vip_address: "10.10.10.1"
    # vip_port: 8080
    ## storage config
    # storage:
    #   database_type: sqlite3 # sqlite3 or mysql. Default sqlite3
    #   connection_url: "" # When database_type is set to sqlite3, the connection_url parameter can be left empty. If it is empty, the default value $home_path/.data.db?cache=shared&_fk=1 will be used. When database_type is set to mysql, the connection_url parameter must be configured, with a sample value of user:password@tcp(10.10.10.1:2883)/test?parseTime=true.

    #oblogproxy:
    #depends:
    #- oceanbase-ce
    #- obproxy-ce
    #servers:
    #- 10.1.250.157
    #version: 2.0.1
    #global:
    #home_path: /data1/stag_oceanbase/oblogproxy
    #service_port: 2983
    #binlog_dir: /data1/stag_oceanbase/oblogproxy/run   # The directory for binlog file. The default value is $home_path/run.
    #binlog_mode: true   # enable binlog mode, default true
#


#oblogproxy:
#  depends:
#    - oceanbase-ce
#    - obproxy-ce
#  servers:
#    - 10.1.250.161
#  version: 2.0.2
#  global:
#    home_path: /data1/stag_oceanbase/oblogproxy
#    service_port: 2983



#    binlog_dir: /root/oblogproxy/run   # The directory for binlog file. The default value is $home_path/run.
#    binlog_mode: true   # enable binlog mode, default true

【复现路径】
obd cluster deploy stagoceanbase -c ./all_comp.yaml -v -f

【附件及日志】
deploy.log (87.0 KB)

1 个赞

obd mirror info:
obd_mirror_info.log (90.1 KB)

麻烦提供一下obd的日志 默认在 ~/.obd/log中

obd log
obd.log (3.2 MB)

报错本地仓库找不到包问题

1 个赞

看日志不是用的remote么?

我把remote 禁用了,本地rpm包也加上了,一样的错误,看着是插件里的代码的问题
local_deploy.log (95.5 KB)

通过obd web 部署也是错误,连observer 都安装不了,下面是日志:
web_deploy.log (95.3 KB)

镜像保持三个仓库都是true就行,三个节点的免密ssh 设置了么

arch: x86_64 一致吗?
os: el8 一致吗?
yum源配置了没?感觉像是在找依赖。

ssh 免密都设置了

[obadmin@dev4 ~]$ cat /etc/redhat-release
Rocky Linux release 8.8 (Green Obsidian)
[obadmin@dev5 ~]$ cat /etc/redhat-release
Anolis OS release 8.4
[obadmin@dev6 ~]$ cat /etc/redhat-release
Rocky Linux release 8.8 (Green Obsidian)

都是el8 ,只是有一台 是龙蜥os ,但是3台 安装single 节点都没有问题,remote我都禁用了,local 包是足够的, 下面单节点在3台上都正常可以部署好,并运行。

user:
   username: obadmin
   password: 12345678!
   port: 22
oceanbase-ce:
  servers:
    # Please don't use hostname, only IP can be supported
    - 10.1.250.157
  global:
    #  The working directory for OceanBase Database. OceanBase Database is started under this directory. This is a required field.
    home_path: /data1/stag_oceanbase/singleobserver
    # The directory for data storage. The default value is $home_path/store.
    # data_dir: /data
    # The directory for clog, ilog, and slog. The default value is the same as the data_dir value.
    # redo_dir: /redo
    # Starting from observer version 4.2, the network selection for the observer is based on the 'local_ip' parameter, and the 'devname' parameter is no longer mandatory.
    # If the 'local_ip' parameter is set, the observer will first use this parameter for the configuration, regardless of the 'devname' parameter.
    # If only the 'devname' parameter is set, the observer will use the 'devname' parameter for the configuration.
    # If neither the 'devname' nor the 'local_ip' parameters are set, the 'local_ip' parameter will be automatically assigned the IP address configured above.
    # devname: eth0
    mysql_port: 2881 # External port for OceanBase Database. The default value is 2881. DO NOT change this value after the cluster is started.
    rpc_port: 2882 # Internal port for OceanBase Database. The default value is 2882. DO NOT change this value after the cluster is started.
    obshell_port: 2886 # Operation and maintenance port for Oceanbase Database. The default value is 2886. This parameter is valid only when the version of oceanbase-ce is 4.2.2.0 or later.
    zone: zone1
    # if current hardware's memory capacity is smaller than 50G, please use the setting of "mini-single-example.yaml" and do a small adjustment.
    memory_limit: 64G # The maximum running memory for an observer
    # The reserved system memory. system_memory is reserved for general tenants. The default value is 30G.
    system_memory: 6G
    datafile_size: 192G # Size of the data file. 
    log_disk_size: 192G # The size of disk space used by the clog files.
    enable_syslog_wf: false # Print system logs whose levels are higher than WARNING to a separate log file. The default value is true.
    enable_syslog_recycle: true # Enable auto system log recycling or not. The default value is false.
    max_syslog_file_count: 4 # The maximum number of reserved log files before enabling auto recycling. The default value is 0.
    # observer cluster name, consistent with obproxy's cluster_name
    appname: singleobcluster
    root_password: DevOps00!@#
    # root_password: # root user password, can be empty
    # proxyro_password: # proxyro user pasword, consistent with obproxy's observer_sys_password, can be empty
obproxy-ce:
  # Set dependent components for the component.
  # When the associated configurations are not done, OBD will automatically get the these configurations from the dependent components.
  depends:
    - oceanbase-ce
  servers:
    - 10.1.250.157
  global:
    listen_port: 2883 # External port. The default value is 2883.
    prometheus_listen_port: 2884 # The Prometheus port. The default value is 2884.
    home_path: /data1/stag_oceanbase/singleobproxy
    # oceanbase root server list
    # format: ip:mysql_port;ip:mysql_port. When a depends exists, OBD gets this value from the oceanbase-ce of the depends.
    # rs_list: 192.168.1.2:2881;192.168.1.3:2881;192.168.1.4:2881
    enable_cluster_checkout: false
    # observer cluster name, consistent with oceanbase-ce's appname. When a depends exists, OBD gets this value from the oceanbase-ce of the depends.
    # cluster_name: obcluster
    skip_proxy_sys_private_check: true
    enable_strict_kernel_release: false
    # obproxy_sys_password: # obproxy sys user password, can be empty. When a depends exists, OBD gets this value from the oceanbase-ce of the depends.
    # observer_sys_password: # proxyro user pasword, consistent with oceanbase-ce's proxyro_password, can be empty. When a depends exists, OBD gets this value from the oceanbase-ce of the depends.

龙蜥这台部署单节点的日志:
anolis_single_node_deploy.log (30.8 KB)

obd 在dev6 上,我可以通过dev6 对dev4 dev5 dev6 部署single节点,我也可以通过dev6,部署dev4+dev6 ,或者dev5+dev6 双observer的 集群,但是一旦部署 dev4+dev5+ dev6 这种 ,就报错了

下面是配置

user:
   username: obadmin
   password: 12345678!
   port: 22
oceanbase-ce:
  version: 4.3.2.1
  servers:
    - name: server1
      ip: 10.1.250.161
    - name: server2
      ip: 10.1.250.160
    - name: server3
      ip: 10.1.250.157
  global:
    memory_limit: 64G # The maximum running memory for an observer
    # The reserved system memory. system_memory is reserved for general tenants. The default value is 30G.
    system_memory: 6G
    datafile_size: 192G # Size of the data file. 
    log_disk_size: 192G # The size of disk space used by the clog files.
    enable_syslog_wf: false # Print system logs whose levels are higher than WARNING to a separate log file. The default value is true.
    enable_syslog_recycle: true # Enable auto system log recycling or not. The default value is false.
    max_syslog_file_count: 4 # The maximum number of reserved log files before enabling auto recycling. The default value is 0.
    # Cluster name for OceanBase Database. The default value is obcluster. When you deploy OceanBase Database and obproxy, this value must be the same as the cluster_name for obproxy.
    appname: stagoceanbase
    root_password: DevOps00!@#
    # proxyro_password: # proxyro user pasword, consistent with obproxy's observer_sys_password, can be empty
    # cdcro_password: # cdcro user password, consistent with oblogproxy's observer_sys_password, can be empty
    ocp_meta_db: ocp_express # The database name of ocp express meta
    ocp_meta_username: meta # The username of ocp express meta
    ocp_meta_password: 'DevOps00!@#' # The password of ocp express meta
    ocp_agent_monitor_password: 'DevOps00!@#' # The password for obagent monitor user
    ocp_meta_tenant: # The config for ocp express meta tenant
      tenant_name: ocp
      max_cpu: 1
      memory_size: 2G
      log_disk_size: 7680M # The recommend value is (4608 + (expect node num + expect tenant num) * 512) M.
  # In this example , support multiple ob process in single node, so different process use different ports.
  # If deploy ob cluster in multiple nodes, the port and path setting can be same. 
  server1:
    mysql_port: 2881 # External port for OceanBase Database. The default value is 2881. DO NOT change this value after the cluster is started.
    rpc_port: 2882 # Internal port for OceanBase Database. The default value is 2882. DO NOT change this value after the cluster is started.
    obshell_port: 2886 # Operation and maintenance port for Oceanbase Database. The default value is 2886. This parameter is valid only when the version of oceanbase-ce is 4.2.2.0 or later.
    # The working directory for OceanBase Database. OceanBase Database is started under this directory. This is a required field.
    home_path: /data1/stag_oceanbase/observer6
    zone: zone1
  server2:
    mysql_port: 2881 # External port for OceanBase Database. The default value is 2881. DO NOT change this value after the cluster is started.
    rpc_port: 2882 # Internal port for OceanBase Database. The default value is 2882. DO NOT change this value after the cluster is started.
    obshell_port: 2886 # Operation and maintenance port for Oceanbase Database. The default value is 2886. This parameter is valid only when the version of oceanbase-ce is 4.2.2.0 or later.
    #  The working directory for OceanBase Database. OceanBase Database is started under this directory. This is a required field.
    home_path: /data1/stag_oceanbase/observer6
    zone: zone2
  server3:
    mysql_port: 2881 # External port for OceanBase Database. The default value is 2881. DO NOT change this value after the cluster is started.
    rpc_port: 2882 # Internal port for OceanBase Database. The default value is 2882. DO NOT change this value after the cluster is started.
    obshell_port: 2886 # Operation and maintenance port for Oceanbase Database. The default value is 2886. This parameter is valid only when the version of oceanbase-ce is 4.2.2.0 or later.
    #  The working directory for OceanBase Database. OceanBase Database is started under this directory. This is a required field.
    home_path: /data1/stag_oceanbase/observer6
    zone: zone3

3个节点,移除任意一个节点就能部署成功,3个一起就不行

日志:
observer_only_deploy.log (25.3 KB)

确实很奇怪,稍等这边咨询下相关同学

那能不能先部署一个2节点的集群,然后手工启动一个observer 加一个进去呢?

可以试试 ,三台机器的规格都是一致的吧?

基本一致 现在先做了一个两节点的集群,然后通过 cluster scale_out 扩了一个zone observer 和obagent ,现在集群是好了

那这就非常奇怪了 :joy:,也不是很容易能复现出你这个问题

3台集群配置
dev4 256GB 48核心
dev5 160GB 40核心
dev6 256GB 72核心

三台差异的有点大,目前ob的适配都是默认当前集群的机器规格一致如负载均衡任务。