oceanbase集群添加节点

想要安装社区版oceanbase集群4.2.1.3
我看文档上oceanbase集群添加节点是采用oat添加的初始化,但是社区版没有oat,我想要添加节点,然后加入到zone中,这个是怎么操作的呢,有详细的步骤参考吗

OceanBase分布式数据库-海量数据 笔笔算数

这个不是得先用oat初始化一下吗。社区版没有oat啊

是的。社区版没有oat
如果使用的是企业版ob ,建议你通过以下方式寻求帮助:
1.如你所在的企业客户已签署OceanBase企业版销售合同,请你联系客户经理;
2.如你所在的企业客户尚未签署OceanBase企业版销售合同,你可通过OceanBase官网商务咨询页面留下你的联系方式,OceanBase企业版的业务顾问会在一个工作日内与你联系。
OceanBase官网商务咨询

我是社区版啊,我添加好了机器,然后向一个zone加机器报错ALTER SYSTEM ADD SERVER ‘10.2xxxxx:2882’ ZONE ‘zone2’;
ERROR 4179 (HY000): add non-empty server “10.2xxxxxxx:2882” not allowed是什么问题呢,具体rootservice报错如下 [RS] add_server (ob_root_service.cpp:7213) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=36] sys tenant data version >= 4.2, add_server(arg={servers:[“10.201.171.35:2882”], zone:“zone2”, force_stop:false, op:0}, timeout_ts=9999504)
[2024-03-14 15:59:22.373923] INFO [STORAGE] ~ObStorageTableGuard (ob_storage_table_guard.cpp:153) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=40] throttle statics(need_speed_limit=false, last_throttle_status=false, last_print_log_time=1710402590154278, stat={total_throttle_time_us:0, total_skip_throttle_time_us:0, last_log_timestamp:1710403162373595, last_throttle_status:false, 0=0, 1=0, 2=0, 3=0})
[2024-03-14 15:59:22.374053] INFO [SHARE] fetch_new_max_id (ob_max_id_fetcher.cpp:274) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=43] fetch_new_max_id(ret=0, ret=“OB_SUCCESS”, size=1, tenant_id=1, fetch_id=7, max_id_type=4, fetch_max_id_type=4, id=18446744073709551615, initial=18446744073709551615)
[2024-03-14 15:59:22.374551] INFO [SERVER] execute_write_inner (ob_inner_sql_connection.cpp:1546) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=30] execute write sql(ret=0, tenant_id=1, affected_rows=1, sql=UPDATE _all_sys_stat SET VALUE = ‘8’, gmt_modified = now(6) WHERE ZONE = ‘’ AND NAME = ‘ob_max_used_server_id’ AND TENANT_ID = 0)
[2024-03-14 15:59:22.374676] INFO [STORAGE.TRANS] get_number (ob_id_service.cpp:389) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=28] get number(ret=0, service_type
=0, range=1, base_id=1710403162374673570, start_id=1710403162374673570, end_id=1710403162374673571)
[2024-03-14 15:59:22.376863] WDIAG [RS] add_servers (ob_server_zone_op_service.cpp:151) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=25][errcode=-4179] adding non-empty server is not allowed(ret=-4179, ret=“OB_OP_NOT_ALLOW”)
[2024-03-14 15:59:22.376891] WDIAG add_servers (ob_server_zone_op_service.cpp:152) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=23][errcode=-4179] add non-empty server “10.201.171.35:2882” not allowed
[2024-03-14 15:59:22.376909] WDIAG [RS] add_server (ob_root_service.cpp:7215) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=15][errcode=-4179] fail to add servers(ret=-4179, ret=“OB_OP_NOT_ALLOW”, arg={servers:[“10.201.171.35:2882”], zone:“zone2”, force_stop:false, op:0})
[2024-03-14 15:59:22.378345] INFO [RS] load_server_statuses (ob_server_manager.cpp:1463) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=26] update server admin status, before update(server=“10.201.164.19:2882”, status={server:“10.201.164.19:2882”, id:2, zone:“zone2”, build_version:“4.2.1.3_103020042024020317-73d0496c8c63179a37214ed26dee718280569ac9(Feb 3 2024 17:21:33)”, sql_port:2881, register_time:0, last_hb_time:1710403160576607, block_migrate_in_time:0, stop_time:0, start_service_time:1710390894824592, last_offline_time:0, last_server_behind_time:0, last_round_trip_time:0, admin_status:“NORMAL”, hb_status:“lease_expired”, with_rootserver:false, with_partition:true, resource_info:{cpu_capacity:80, cpu_assigned:12, cpu_assigned_max:12, mem_capacity:“224GB”, mem_assigned:“34GB”, mem_in_use:0GB, log_disk_capacity:5120GB, log_disk_assigned:110GB, data_disk_capacity:5120GB, data_disk_in_use:0.759765625GB}, leader_cnt:-1, server_report_status:0, lease_expire_time:1710390929141315, ssl_key_expired_time:0, in_recovery_for_takenover_by_rs:false})
[2024-03-14 15:59:22.378439] INFO [RS] load_server_statuses (ob_server_manager.cpp:1474) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=92] update server admin status, after update(server=“10.201.164.19:2882”, status={server:“10.201.164.19:2882”, id:2, zone:“zone2”, build_version:“4.2.1.3_103020042024020317-73d0496c8c63179a37214ed26dee718280569ac9(Feb 3 2024 17:21:33)”, sql_port:2881, register_time:0, last_hb_time:1710403160576607, block_migrate_in_time:0, stop_time:0, start_service_time:1710390894824592, last_offline_time:0, last_server_behind_time:0, last_round_trip_time:0, admin_status:“NORMAL”, hb_status:“lease_expired”, with_rootserver:false, with_partition:true, resource_info:{cpu_capacity:80, cpu_assigned:12, cpu_assigned_max:12, mem_capacity:“224GB”, mem_assigned:“34GB”, mem_in_use:0GB, log_disk_capacity:5120GB, log_disk_assigned:110GB, data_disk_capacity:5120GB, data_disk_in_use:0.759765625GB}, leader_cnt:-1, server_report_status:0, lease_expire_time:1710390929141315, ssl_key_expired_time:0, in_recovery_for_takenover_by_rs:false})
[2024-03-14 15:59:22.378484] INFO [RS] submit_update_all_server_task (ob_root_service.cpp:1368) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=41] self is RS and self status change, submit update rslist task(server=“10.201.164.19:2882”)
[2024-03-14 15:59:22.378511] INFO [RS] submit_update_rslist_task (ob_root_service.cpp:1665) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=13] added async task to update rslist(force_update=false)
[2024-03-14 15:59:22.378528] INFO [RS] on_server_status_change (ob_root_service.cpp:181) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=16] on_server_status_change finish(ret=0, ret=“OB_SUCCESS”, server=“10.201.164.19:2882”)
[2024-03-14 15:59:22.378544] INFO [RS] load_server_statuses (ob_server_manager.cpp:1463) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=14] update server admin status, before update(server=“10.201.172.14:2882”, status={server:“10.201.172.14:2882”, id:1, zone:“zone1”, build_version:“4.2.1.3_103020042024020317-73d0496c8c63179a37214ed26dee718280569ac9(Feb 3 2024 17:21:33)”, sql_port:2881, register_time:0, last_hb_time:1710403160581750, block_migrate_in_time:0, stop_time:0, start_service_time:1710390894427915, last_offline_time:0, last_server_behind_time:0, last_round_trip_time:0, admin_status:“NORMAL”, hb_status:“lease_expired”, with_rootserver:false, with_partition:true, resource_info:{cpu_capacity:80, cpu_assigned:12, cpu_assigned_max:12, mem_capacity:“224GB”, mem_assigned:“34GB”, mem_in_use:0GB, log_disk_capacity:5120GB, log_disk_assigned:110GB, data_disk_capacity:5120GB, data_disk_in_use:0.76171875GB}, leader_cnt:-1, server_report_status:0, lease_expire_time:1710390929141320, ssl_key_expired_time:0, in_recovery_for_takenover_by_rs:false})
[2024-03-14 15:59:22.378588] INFO [RS] load_server_statuses (ob_server_manager.cpp:1474) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=43] update server admin status, after update(server=“10.201.172.14:2882”, status={server:“10.201.172.14:2882”, id:1, zone:“zone1”, build_version:“4.2.1.3_103020042024020317-73d0496c8c63179a37214ed26dee718280569ac9(Feb 3 2024 17:21:33)”, sql_port:2881, register_time:0, last_hb_time:1710403160581750, block_migrate_in_time:0, stop_time:0, start_service_time:1710390894427915, last_offline_time:0, last_server_behind_time:0, last_round_trip_time:0, admin_status:“NORMAL”, hb_status:“lease_expired”, with_rootserver:false, with_partition:true, resource_info:{cpu_capacity:80, cpu_assigned:12, cpu_assigned_max:12, mem_capacity:“224GB”, mem_assigned:“34GB”, mem_in_use:0GB, log_disk_capacity:5120GB, log_disk_assigned:110GB, data_disk_capacity:5120GB, data_disk_in_use:0.76171875GB}, leader_cnt:-1, server_report_status:0, lease_expire_time:1710390929141320, ssl_key_expired_time:0, in_recovery_for_takenover_by_rs:false})
[2024-03-14 15:59:22.378625] INFO [RS] submit_update_all_server_task (ob_root_service.cpp:1368) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=36] self is RS and self status change, submit update rslist task(server=“10.201.172.14:2882”)
[2024-03-14 15:59:22.378636] INFO [RS] try_lock (ob_update_rs_list_task.cpp:54) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=10] update rslist task exist, do not submit again(cnt=2)
[2024-03-14 15:59:22.378650] WDIAG [RS] submit_update_rslist_task (ob_root_service.cpp:1671) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=14][errcode=0] fail to submit update rslist task, need retry(force_update=false)
[2024-03-14 15:59:22.378667] INFO [RS] on_server_status_change (ob_root_service.cpp:181) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=16] on_server_status_change finish(ret=0, ret=“OB_SUCCESS”, server=“10.201.172.14:2882”)
[2024-03-14 15:59:22.378738] INFO [SHARE] add_event (ob_event_history_table_operator.h:290) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=11] event table add task(ret=0, event_table_name="__all_rootservice_event_history", sql=INSERT INTO all_rootservice_event_history (gmt_create, module, event, name1, value1, name2, value2, rs_svr_ip, rs_svr_port) VALUES (usec_to_time(1710403162378679), ‘server’, ‘load_servers’, ‘ret’, 0, ‘has_build’, 1, ‘10.201.164.19’, 2882))
[2024-03-14 15:59:22.378760] INFO [RS] add_server (ob_root_service.cpp:7226) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=18] add server(ret=-4179, ret=“OB_OP_NOT_ALLOW”, arg={servers:[“10.201.171.35:2882”], zone:“zone2”, force_stop:false, op:0})
[2024-03-14 15:59:22.378782] WDIAG [RS] process
(ob_rs_rpc_processor.h:212) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=21][errcode=-4179] process failed(ret=-4179)
[2024-03-14 15:59:22.378796] INFO [RS] process
(ob_rs_rpc_processor.h:232) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=12] [DDL] execute ddl like stmt(ret=-4179, cost=6851, ddl_arg=NULL)

这个提示估计是你新增的节点 环境不干净(是不是以前装过 OB 软件没清理干净)。

OCP 部署集群并不依赖 OAT。 OCP 里可以新增主机部署集群,也可以对现有集群新增节点。只不过要求节点要做一些初始化配置(详细看官网里节点初始化要求,包含 内核参数、目录、用户、内存设置等等)。
OAT 只是自动化做了一部分设置,但不是非有不可。
数据库的部署环境都有详细的要求,这是数据库稳定、高性能运行的基础。不管有没有 OAT,文档里都会详细解释这些要求。不满足要求的时候,后面部署就会有报错。

我想直接在obclient客户端操作,以前的确实可能没清除干净,这个如何把以前安装的都清除呢,有什么参考吗

假设你上次安装的目录都是默认安装行为。

su - admin
cd oceanbase
/bin/rm -rf etc/*config*  log/*  run/* 

另外删除数据目录和日志目录,看你放在那个位置。下面是示例:

/bin/rm -rf /data/1/obdemo/*/*
/bin/rm -rf /data/log1/obdemo/*/*

手动部署方法有兴趣的可以看看: 实战教程第二章:如何部署 OceanBase 社区版章节介绍-数据库技术博客-OceanBase分布式数据库
4.x 跟3.x 个别目录有变化,大体思路还是一样的。4.x 的DBA 教程直播看官方公众号宣传近期也会推出。可以追一下。

我都是使用root部署得,为什么非得用admin部署呢

数据库软件都不建议跑在 root 用户下,会有安全隐患。
就算是 mysql,也只是把守护进程 mysqld_safe 放在 root 用户下,实际工作进程 mysqld 还是运行在 mysql用户下。
oracle 一般运行在用户 oracle 下,db2默认用户 db2inst1,postgresql 默认用户 postres
oceanbase 生产环境默认部署在用户 admin 下。

建议一开始就习惯用 admin 用户部署(只是名字叫 admin,不表示它是 administrator,这是阿里以前的主机常用用户。你换别的用户名也行)。
但是一旦你部署了已经用了 root,如果不熟悉OB 数据文件目录的话,那就坚持继续用 root 吧。 要换 用户也是可以。

那我应该如何无损的迁移到admin用户,这个有什么教程吗

我用admin用obd部署oceanbase和ocp,报错failed to start 10.201.172.14 ocp-server, remaining retries: 37
[2024-03-14 17:50:34.046] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- 10.201.172.14 program health check
[2024-03-14 17:50:34.046] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.172.14 execute: ls /proc/66794
[2024-03-14 17:50:34.097] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 2, error output:
[2024-03-14 17:50:34.097] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ls: cannot access /proc/66794: No such file or directory
[2024-03-14 17:50:34.097] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG]
[2024-03-14 17:50:34.185] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [ERROR] failed to start 10.201.172.14 ocp-server
[2024-03-14 17:50:34.186] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [ERROR] start ocp-server failed
[2024-03-14 17:50:34.186] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] — sub start ref count to 0
[2024-03-14 17:50:34.186] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] — export start
[2024-03-14 17:50:34.187] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] – Call ocp-server-ce-py_script_stop-4.2.1 for ocp-server-ce-4.2.1-20231208144448.el7-58cf72891d75a2fa7c754bafc42d336525baf0b5
[2024-03-14 17:50:34.187] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [INFO] Stop ocp-server
[2024-03-14 17:50:34.188] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.172.14 execute: cat /disk/nvme1n1/ocp/run/ocp-server.pid
[2024-03-14 17:50:34.236] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 0
[2024-03-14 17:50:34.237] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.172.14 execute: ls /proc/66794
[2024-03-14 17:50:34.323] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 2, error output:
[2024-03-14 17:50:34.324] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ls: cannot access /proc/66794: No such file or directory
[2024-03-14 17:50:34.324] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG]
[2024-03-14 17:50:34.324] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- 10.201.172.14 ocp-server is not running
[2024-03-14 17:50:34.324] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.164.19 execute: cat /disk/nvme1n1/ocp/run/ocp-server.pid
[2024-03-14 17:50:34.373] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 0
[2024-03-14 17:50:34.373] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.164.19 execute: ls /proc/75980
[2024-03-14 17:50:34.458] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 2, error output:
[2024-03-14 17:50:34.458] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ls: cannot access /proc/75980: No such file or directory
[2024-03-14 17:50:34.458] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG]
[2024-03-14 17:50:34.459] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- 10.201.164.19 ocp-server is not running
[2024-03-14 17:50:34.459] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.171.35 execute: cat /disk/nvme1n1/ocp/run/ocp-server.pid
[2024-03-14 17:50:34.505] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 0
[2024-03-14 17:50:34.506] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.171.35 execute: ls /proc/10239
[2024-03-14 17:50:34.590] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 2, error output:
[2024-03-14 17:50:34.590] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ls: cannot access /proc/10239: No such file or directory
[2024-03-14 17:50:34.590] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG]
[2024-03-14 17:50:34.591] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- 10.201.171.35 ocp-server is not running
[2024-03-14 17:50:35.625] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - admin@10.201.172.14 execute: sudo chown -R root: /disk/nvme1n1/ocp
[2024-03-14 17:50:36.712] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - exited code 0
[2024-03-14 17:50:36.713] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - admin@10.201.164.19 execute: sudo chown -R root: /disk/nvme1n1/ocp
[2024-03-14 17:50:36.822] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - exited code 0
[2024-03-14 17:50:36.823] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - admin@10.201.171.35 execute: sudo chown -R root: /disk/nvme1n1/ocp
[2024-03-14 17:50:36.930] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - exited code 0
[2024-03-14 17:50:36.931] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - sub restart ref count to 0
[2024-03-14 17:50:36.932] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - export restart,这个怎么处理呢

你的环境有点乱。如果可以的话,建议铲掉重来。
清理环境包含:

  1. 杀掉进程 observerobproxy
  2. 清理掉 安装的软件目录
  3. 清理到数据文件和日志文件目录。
    OB 社区版部署的方法途径现在有很多,方便不同喜好的人选择。
    重新部署的时候建议说明你部署参考的文档方法,关键配置文件和页面截图、部署的机器资源(内存和空间)。只有信息都说全了,你的问题别人才能给出有效的建议。

另外,如果你选用 obd 软件部署,我说安装在admin用户下是指配置文件里有个用户指定为 admin。并不是要求 运行 obd 的时候也在 admin下。当然在admin用户下运行 obd也行。只是有的人在root 和 admin 用户下都用过obd 部署,最后自己容易混淆。因为这两个obd 运行时产生的配置文件是不同的。

这个我知道,好的,谢谢