ocp-3.3.0-ce社区版安装报错

【 使用环境 】测试环境
【 OB or 其他组件 】
oceanbase-ce-3.1.5
obproxy-ce-3.2.3.5-2
ob-deploy-1.6.2
obclient-2.2.1-4
obagent-1.2.0
【 使用版本 】ocp-3.3.0-ce-bp2-x86_64
【问题描述】oceanbase集群已创建,在obproxy机器上黑屏安装ocp时报错
【复现路径】问题出现前后相关操作
【问题现象及影响】

[admin@obclient ocp-3.3.0-ce-bp2-x86_64]$ cat …/config.yaml

[admin@obclient ocp-3.3.0-ce-bp2-x86_64]$ sudo ./ocp_installer.sh install -c …/config.yaml -i ./ocp-installer.tar.gz -o ./ocp.tar.gz

两条命令的执行情况,详见以下回复区的截图。





  1. 之前的那个 OB 集群配置文件中 cpu_count : 8 ,这个太少了。 这是 ob内部参数,可以指定为 16 以上(就算你的服务器没有16个cpu也没关系)。 容器的 cpu那个个数确实不能超过实际 cpu。

create_metadb.py 报错可能跟这个有关系。

现在报错退出了,要考虑到 安装是不是已经创建了部分东西。查一下下面sql:

select a.zone,concat(a.svr_ip,':',a.svr_port) observer, cpu_total, (cpu_total-cpu_assigned) cpu_free
, round(mem_total/1024/1024/1024) mem_total_gb, round((mem_total-mem_assigned)/1024/1024/1024,2) mem_free_gb
, round(disk_total/1024/1024/1024) disk_total_gb, round((disk_total-disk_assigned)/1024/1024/1024) disk_free_gb
from __all_virtual_server_stat a join __all_server b on (a.svr_ip=b.svr_ip and a.svr_port=b.svr_port)
order by a.zone, a.svr_ip
;

select t1.name resource_pool_name,  t2.`name` unit_config_name, t2.max_cpu, t2.min_cpu
, round(t2.max_memory/1024/1024/1024) max_mem_gb, round(t2.min_memory/1024/1024/1024) min_mem_gb
, round(t2.max_disk_size/1024/1024/1024) max_disk_size , t4.tenant_name
from __all_resource_pool t1 join __all_unit_config t2 on (t1.unit_config_id=t2.unit_config_id)
    join __all_unit t3 on (t1.`resource_pool_id` = t3.`resource_pool_id`)
    left join __all_tenant t4 on (t1.tenant_id=t4.tenant_id)
order by t1.`resource_pool_id`, t2.`unit_config_id`, t3.unit_id
;

create_metadb.py如果写的好,可重入,再次跑也许能成功。如果报什么已经存在之类,那就要手动删除。

谢谢您的回复!

调成16c了,还是一样报内存错误,有办法降低这个内存需求吗?


vim ocp_install.sh
找到那个内存判断的代码 自己改改。

这个只建议测试环境改。

在config.yaml中关闭配置预检参数即可跳过配置不符合生产要求限制,precheck_ignore: true
另外看报错信息,Get Location Cache Fail 应该是连接租户找不到,你应该使用的是已有集群当metadb,这种情况是需要提前在已有集群中创建对应租户的。嫌麻烦的话,可以设置create_metadb_cluster: true 会自动按下面ob_cluster模块配置创建一个ob当metadb使用。

1 个赞

修改了,报告这个错误,是什么原因?

没找到这个脚本。

这个提示是执行了sudo的时候报出来的信息,是要有sudo权限,并且要免密的。

1 个赞

确实是sudo问题,改了。老问题:内存大小如何规避?

precheck_ignore: true 如果设置参数为true了 这个告警是没关系的。

已设置了,是这个报错退出。
image

看下你当前的配置文件。
这个地方ssh互信超时,可能是配置参数不正确 或者现场的ssh环境有问题。

[admin@obclient ocp-3.3.0-ce-bp2-x86_64]$ cat …/config.yaml

# OCP deploy config
# Note:
# Do not use 127.0.0.1 or hostname as server address
# When a server has both public ip and private ip, if private ip is connectable, use private ip for faster connection
# If a vip is configured, it should be already created and bonded to the right server and port, the installation script won't do any work with vip maintainance, just use it to connect to the service

# Ignore precheck errors
# It's recommanded to not ignore precheck errors
#precheck_ignore: false
precheck_ignore: true

# Create an obcluster as OCP's metadb
create_metadb_cluster: true

# Clean OCP's metadb cluster when uninstall
clean_metadb_cluster: flase

# Metadb cluster deploy config
ob_cluster:
  name: obcluster
  home_path: /home/admin/oceanbase
  root_password: 'root123'
  # The directory for data storage, it's recommanded to use an independent path
  data_path: /home/admin/data/1
  # The directory for clog, ilog, and slog, it's recommanded to use an independent path.
  redo_path: /home/admin/data/log1
  sql_port: 2881
  rpc_port: 2882
  zones:
    - name: zone1
      servers:
        - 10.30.41.105
        - 10.30.41.106
        - 10.30.41.107
  ## custom obd config for obcluster
  custom_config:
    - key: devname
      value: ens32
    - key: cpu_count
      value: 16
    - key: memory_limit
      value: 10G
    - key: system_memory
      value: 1G
     #   - key: __min_full_resource_pool_memory
     #     value: 5368709120
     #   - key: datafile_maxsize
     #     value: 0
     #   - key: datafile_next
     #     value: 0

  # Meta user info
  meta:
    tenant: meta_tenant
    user: meta_user
    password: meta_password
    database: meta_database
    cpu: 1
    # Memory configs in GB, 4 means 4GB
    memory: 2

  # Monitor user info
  monitor:
    tenant: monitor_tenant
    user: monitor_user
    password: monitor_password
    database: monitor_database
    cpu: 1
    # Memory configs in GB, 8 means 8GB
    memory: 2

# Obproxy to connect metadb cluster
obproxy:
  home_path: /home/admin/obproxy
  port: 2883
  servers:
    - 10.30.41.104

  ## custom config for obproxy
  # custom_config:
  #   - key: clustername
  #     value: obcluster


  ## Vip is optional, if vip is not configured, one of obproxy servers's address will be used
  # vip:
  #   address: 1.1.1.1
  #   port: 2883

# Ssh auth config
ssh:
  port: 22
  user: admin
  # auth method, support password and pubkey
  auth_method: password
  password: admin123

# OCP config
ocp:
  # ocp container's name
  name: 'ocp'

  # OCP process listen port and log dir on host
  process:
    port: 8080
    log_dir: /tmp/ocp/log
  servers:
    - 10.30.41.104
  # OCP container's resource
  resource:
    cpu: 2
    # Memory configs in GB, 8 means 8GB
    memory: 4
  # Vip is optional, if vip is not configured, one of ocp servers's address will be used
  # vip:
  #   address: 1.1.1.1
  #   port: 8080
  # OCP basic auth config, used when upgrade ocp
  auth:
    user: admin
    password: admin123
  # OCP metadb config, for ocp installation, if "create_metadb_cluster" is configured true, this part will be replaced with the configuration of metadb cluster and obproxy
  metadb:
    host: 10.30.41.104
    port: 2883
    meta_user: meta_user@meta_tenant#obcluster
    meta_password: meta_password
    meta_database: meta_database
    monitor_user: monitor_user@monitor_tenant#obcluster
    monitor_password: monitor_password
    monitor_database: monitor_database

[admin@obclient ocp-3.3.0-ce-bp2-x86_64]$ ssh 10.30.41.104 date
Mon Jun 19 18:35:17 CST 2023
[admin@obclient ocp-3.3.0-ce-bp2-x86_64]$ ssh 10.30.41.105 date
Mon Jun 19 18:35:20 CST 2023
[admin@obclient ocp-3.3.0-ce-bp2-x86_64]$ ssh 10.30.41.106 date
Mon Jun 19 18:35:23 CST 2023
[admin@obclient ocp-3.3.0-ce-bp2-x86_64]$ ssh 10.30.41.107 date
Mon Jun 19 18:35:27 CST 2023

cd ~/.obd/log 下的obd日志提供下。

obd.zip (27.4 KB)

不对啊,刚刚上传的文件不是今日的,日期不对。
[admin@obclient log]$ ls -l
-rw-rw-r-- 1 admin admin 455761 Jun 16 19:24 obd
[admin@obclient log]$ date
Mon Jun 19 19:39:08 CST 2023

看最新的日志 ob是部署成功的。之前ssh问题解决了吗?