想要安装社区版oceanbase集群4.2.1.3
我看文档上oceanbase集群添加节点是采用oat添加的初始化,但是社区版没有oat,我想要添加节点,然后加入到zone中,这个是怎么操作的呢,有详细的步骤参考吗
这个不是得先用oat初始化一下吗。社区版没有oat啊
是的。社区版没有oat
如果使用的是企业版ob ,建议你通过以下方式寻求帮助:
1.如你所在的企业客户已签署OceanBase企业版销售合同,请你联系客户经理;
2.如你所在的企业客户尚未签署OceanBase企业版销售合同,你可通过OceanBase官网商务咨询页面留下你的联系方式,OceanBase企业版的业务顾问会在一个工作日内与你联系。
OceanBase官网商务咨询
我是社区版啊,我添加好了机器,然后向一个zone加机器报错ALTER SYSTEM ADD SERVER ‘10.2xxxxx:2882’ ZONE ‘zone2’;
ERROR 4179 (HY000): add non-empty server “10.2xxxxxxx:2882” not allowed是什么问题呢,具体rootservice报错如下 [RS] add_server (ob_root_service.cpp:7213) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=36] sys tenant data version >= 4.2, add_server(arg={servers:[“10.201.171.35:2882”], zone:“zone2”, force_stop:false, op:0}, timeout_ts=9999504)
[2024-03-14 15:59:22.373923] INFO [STORAGE] ~ObStorageTableGuard (ob_storage_table_guard.cpp:153) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=40] throttle statics(need_speed_limit=false, last_throttle_status=false, last_print_log_time=1710402590154278, stat={total_throttle_time_us:0, total_skip_throttle_time_us:0, last_log_timestamp:1710403162373595, last_throttle_status:false, 0=0, 1=0, 2=0, 3=0})
[2024-03-14 15:59:22.374053] INFO [SHARE] fetch_new_max_id (ob_max_id_fetcher.cpp:274) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=43] fetch_new_max_id(ret=0, ret=“OB_SUCCESS”, size=1, tenant_id=1, fetch_id=7, max_id_type=4, fetch_max_id_type=4, id=18446744073709551615, initial=18446744073709551615)
[2024-03-14 15:59:22.374551] INFO [SERVER] execute_write_inner (ob_inner_sql_connection.cpp:1546) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=30] execute write sql(ret=0, tenant_id=1, affected_rows=1, sql=UPDATE _all_sys_stat SET VALUE = ‘8’, gmt_modified = now(6) WHERE ZONE = ‘’ AND NAME = ‘ob_max_used_server_id’ AND TENANT_ID = 0)
[2024-03-14 15:59:22.374676] INFO [STORAGE.TRANS] get_number (ob_id_service.cpp:389) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=28] get number(ret=0, service_type=0, range=1, base_id=1710403162374673570, start_id=1710403162374673570, end_id=1710403162374673571)
[2024-03-14 15:59:22.376863] WDIAG [RS] add_servers (ob_server_zone_op_service.cpp:151) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=25][errcode=-4179] adding non-empty server is not allowed(ret=-4179, ret=“OB_OP_NOT_ALLOW”)
[2024-03-14 15:59:22.376891] WDIAG add_servers (ob_server_zone_op_service.cpp:152) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=23][errcode=-4179] add non-empty server “10.201.171.35:2882” not allowed
[2024-03-14 15:59:22.376909] WDIAG [RS] add_server (ob_root_service.cpp:7215) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=15][errcode=-4179] fail to add servers(ret=-4179, ret=“OB_OP_NOT_ALLOW”, arg={servers:[“10.201.171.35:2882”], zone:“zone2”, force_stop:false, op:0})
[2024-03-14 15:59:22.378345] INFO [RS] load_server_statuses (ob_server_manager.cpp:1463) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=26] update server admin status, before update(server=“10.201.164.19:2882”, status={server:“10.201.164.19:2882”, id:2, zone:“zone2”, build_version:“4.2.1.3_103020042024020317-73d0496c8c63179a37214ed26dee718280569ac9(Feb 3 2024 17:21:33)”, sql_port:2881, register_time:0, last_hb_time:1710403160576607, block_migrate_in_time:0, stop_time:0, start_service_time:1710390894824592, last_offline_time:0, last_server_behind_time:0, last_round_trip_time:0, admin_status:“NORMAL”, hb_status:“lease_expired”, with_rootserver:false, with_partition:true, resource_info:{cpu_capacity:80, cpu_assigned:12, cpu_assigned_max:12, mem_capacity:“224GB”, mem_assigned:“34GB”, mem_in_use:0GB, log_disk_capacity:5120GB, log_disk_assigned:110GB, data_disk_capacity:5120GB, data_disk_in_use:0.759765625GB}, leader_cnt:-1, server_report_status:0, lease_expire_time:1710390929141315, ssl_key_expired_time:0, in_recovery_for_takenover_by_rs:false})
[2024-03-14 15:59:22.378439] INFO [RS] load_server_statuses (ob_server_manager.cpp:1474) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=92] update server admin status, after update(server=“10.201.164.19:2882”, status={server:“10.201.164.19:2882”, id:2, zone:“zone2”, build_version:“4.2.1.3_103020042024020317-73d0496c8c63179a37214ed26dee718280569ac9(Feb 3 2024 17:21:33)”, sql_port:2881, register_time:0, last_hb_time:1710403160576607, block_migrate_in_time:0, stop_time:0, start_service_time:1710390894824592, last_offline_time:0, last_server_behind_time:0, last_round_trip_time:0, admin_status:“NORMAL”, hb_status:“lease_expired”, with_rootserver:false, with_partition:true, resource_info:{cpu_capacity:80, cpu_assigned:12, cpu_assigned_max:12, mem_capacity:“224GB”, mem_assigned:“34GB”, mem_in_use:0GB, log_disk_capacity:5120GB, log_disk_assigned:110GB, data_disk_capacity:5120GB, data_disk_in_use:0.759765625GB}, leader_cnt:-1, server_report_status:0, lease_expire_time:1710390929141315, ssl_key_expired_time:0, in_recovery_for_takenover_by_rs:false})
[2024-03-14 15:59:22.378484] INFO [RS] submit_update_all_server_task (ob_root_service.cpp:1368) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=41] self is RS and self status change, submit update rslist task(server=“10.201.164.19:2882”)
[2024-03-14 15:59:22.378511] INFO [RS] submit_update_rslist_task (ob_root_service.cpp:1665) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=13] added async task to update rslist(force_update=false)
[2024-03-14 15:59:22.378528] INFO [RS] on_server_status_change (ob_root_service.cpp:181) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=16] on_server_status_change finish(ret=0, ret=“OB_SUCCESS”, server=“10.201.164.19:2882”)
[2024-03-14 15:59:22.378544] INFO [RS] load_server_statuses (ob_server_manager.cpp:1463) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=14] update server admin status, before update(server=“10.201.172.14:2882”, status={server:“10.201.172.14:2882”, id:1, zone:“zone1”, build_version:“4.2.1.3_103020042024020317-73d0496c8c63179a37214ed26dee718280569ac9(Feb 3 2024 17:21:33)”, sql_port:2881, register_time:0, last_hb_time:1710403160581750, block_migrate_in_time:0, stop_time:0, start_service_time:1710390894427915, last_offline_time:0, last_server_behind_time:0, last_round_trip_time:0, admin_status:“NORMAL”, hb_status:“lease_expired”, with_rootserver:false, with_partition:true, resource_info:{cpu_capacity:80, cpu_assigned:12, cpu_assigned_max:12, mem_capacity:“224GB”, mem_assigned:“34GB”, mem_in_use:0GB, log_disk_capacity:5120GB, log_disk_assigned:110GB, data_disk_capacity:5120GB, data_disk_in_use:0.76171875GB}, leader_cnt:-1, server_report_status:0, lease_expire_time:1710390929141320, ssl_key_expired_time:0, in_recovery_for_takenover_by_rs:false})
[2024-03-14 15:59:22.378588] INFO [RS] load_server_statuses (ob_server_manager.cpp:1474) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=43] update server admin status, after update(server=“10.201.172.14:2882”, status={server:“10.201.172.14:2882”, id:1, zone:“zone1”, build_version:“4.2.1.3_103020042024020317-73d0496c8c63179a37214ed26dee718280569ac9(Feb 3 2024 17:21:33)”, sql_port:2881, register_time:0, last_hb_time:1710403160581750, block_migrate_in_time:0, stop_time:0, start_service_time:1710390894427915, last_offline_time:0, last_server_behind_time:0, last_round_trip_time:0, admin_status:“NORMAL”, hb_status:“lease_expired”, with_rootserver:false, with_partition:true, resource_info:{cpu_capacity:80, cpu_assigned:12, cpu_assigned_max:12, mem_capacity:“224GB”, mem_assigned:“34GB”, mem_in_use:0GB, log_disk_capacity:5120GB, log_disk_assigned:110GB, data_disk_capacity:5120GB, data_disk_in_use:0.76171875GB}, leader_cnt:-1, server_report_status:0, lease_expire_time:1710390929141320, ssl_key_expired_time:0, in_recovery_for_takenover_by_rs:false})
[2024-03-14 15:59:22.378625] INFO [RS] submit_update_all_server_task (ob_root_service.cpp:1368) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=36] self is RS and self status change, submit update rslist task(server=“10.201.172.14:2882”)
[2024-03-14 15:59:22.378636] INFO [RS] try_lock (ob_update_rs_list_task.cpp:54) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=10] update rslist task exist, do not submit again(cnt=2)
[2024-03-14 15:59:22.378650] WDIAG [RS] submit_update_rslist_task (ob_root_service.cpp:1671) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=14][errcode=0] fail to submit update rslist task, need retry(force_update=false)
[2024-03-14 15:59:22.378667] INFO [RS] on_server_status_change (ob_root_service.cpp:181) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=16] on_server_status_change finish(ret=0, ret=“OB_SUCCESS”, server=“10.201.172.14:2882”)
[2024-03-14 15:59:22.378738] INFO [SHARE] add_event (ob_event_history_table_operator.h:290) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=11] event table add task(ret=0, event_table_name="__all_rootservice_event_history", sql=INSERT INTO all_rootservice_event_history (gmt_create, module, event, name1, value1, name2, value2, rs_svr_ip, rs_svr_port) VALUES (usec_to_time(1710403162378679), ‘server’, ‘load_servers’, ‘ret’, 0, ‘has_build’, 1, ‘10.201.164.19’, 2882))
[2024-03-14 15:59:22.378760] INFO [RS] add_server (ob_root_service.cpp:7226) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=18] add server(ret=-4179, ret=“OB_OP_NOT_ALLOW”, arg={servers:[“10.201.171.35:2882”], zone:“zone2”, force_stop:false, op:0})
[2024-03-14 15:59:22.378782] WDIAG [RS] process (ob_rs_rpc_processor.h:212) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=21][errcode=-4179] process failed(ret=-4179)
[2024-03-14 15:59:22.378796] INFO [RS] process (ob_rs_rpc_processor.h:232) [53315][T1_L0_G0][T1][YB420AC9A413-0006139777B72403-0-0] [lt=12] [DDL] execute ddl like stmt(ret=-4179, cost=6851, ddl_arg=NULL)
这个提示估计是你新增的节点 环境不干净(是不是以前装过 OB 软件没清理干净)。
OCP 部署集群并不依赖 OAT。 OCP 里可以新增主机部署集群,也可以对现有集群新增节点。只不过要求节点要做一些初始化配置(详细看官网里节点初始化要求,包含 内核参数、目录、用户、内存设置等等)。
OAT 只是自动化做了一部分设置,但不是非有不可。
数据库的部署环境都有详细的要求,这是数据库稳定、高性能运行的基础。不管有没有 OAT,文档里都会详细解释这些要求。不满足要求的时候,后面部署就会有报错。
我想直接在obclient客户端操作,以前的确实可能没清除干净,这个如何把以前安装的都清除呢,有什么参考吗
假设你上次安装的目录都是默认安装行为。
su - admin
cd oceanbase
/bin/rm -rf etc/*config* log/* run/*
另外删除数据目录和日志目录,看你放在那个位置。下面是示例:
/bin/rm -rf /data/1/obdemo/*/*
/bin/rm -rf /data/log1/obdemo/*/*
手动部署方法有兴趣的可以看看: 实战教程第二章:如何部署 OceanBase 社区版章节介绍-数据库技术博客-OceanBase分布式数据库 。
4.x 跟3.x 个别目录有变化,大体思路还是一样的。4.x 的DBA 教程直播看官方公众号宣传近期也会推出。可以追一下。
我都是使用root部署得,为什么非得用admin部署呢
数据库软件都不建议跑在 root 用户下,会有安全隐患。
就算是 mysql,也只是把守护进程 mysqld_safe
放在 root
用户下,实际工作进程 mysqld
还是运行在 mysql
用户下。
oracle 一般运行在用户 oracle
下,db2默认用户 db2inst1
,postgresql 默认用户 postres
。
oceanbase 生产环境默认部署在用户 admin
下。
建议一开始就习惯用 admin 用户部署(只是名字叫 admin,不表示它是 administrator,这是阿里以前的主机常用用户。你换别的用户名也行)。
但是一旦你部署了已经用了 root
,如果不熟悉OB 数据文件目录的话,那就坚持继续用 root
吧。 要换 用户也是可以。
那我应该如何无损的迁移到admin用户,这个有什么教程吗
我用admin用obd部署oceanbase和ocp,报错failed to start 10.201.172.14 ocp-server, remaining retries: 37
[2024-03-14 17:50:34.046] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- 10.201.172.14 program health check
[2024-03-14 17:50:34.046] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.172.14 execute: ls /proc/66794
[2024-03-14 17:50:34.097] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 2, error output:
[2024-03-14 17:50:34.097] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ls: cannot access /proc/66794: No such file or directory
[2024-03-14 17:50:34.097] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG]
[2024-03-14 17:50:34.185] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [ERROR] failed to start 10.201.172.14 ocp-server
[2024-03-14 17:50:34.186] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [ERROR] start ocp-server failed
[2024-03-14 17:50:34.186] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] — sub start ref count to 0
[2024-03-14 17:50:34.186] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] — export start
[2024-03-14 17:50:34.187] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] – Call ocp-server-ce-py_script_stop-4.2.1 for ocp-server-ce-4.2.1-20231208144448.el7-58cf72891d75a2fa7c754bafc42d336525baf0b5
[2024-03-14 17:50:34.187] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [INFO] Stop ocp-server
[2024-03-14 17:50:34.188] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.172.14 execute: cat /disk/nvme1n1/ocp/run/ocp-server.pid
[2024-03-14 17:50:34.236] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 0
[2024-03-14 17:50:34.237] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.172.14 execute: ls /proc/66794
[2024-03-14 17:50:34.323] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 2, error output:
[2024-03-14 17:50:34.324] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ls: cannot access /proc/66794: No such file or directory
[2024-03-14 17:50:34.324] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG]
[2024-03-14 17:50:34.324] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- 10.201.172.14 ocp-server is not running
[2024-03-14 17:50:34.324] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.164.19 execute: cat /disk/nvme1n1/ocp/run/ocp-server.pid
[2024-03-14 17:50:34.373] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 0
[2024-03-14 17:50:34.373] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.164.19 execute: ls /proc/75980
[2024-03-14 17:50:34.458] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 2, error output:
[2024-03-14 17:50:34.458] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ls: cannot access /proc/75980: No such file or directory
[2024-03-14 17:50:34.458] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG]
[2024-03-14 17:50:34.459] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- 10.201.164.19 ocp-server is not running
[2024-03-14 17:50:34.459] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.171.35 execute: cat /disk/nvme1n1/ocp/run/ocp-server.pid
[2024-03-14 17:50:34.505] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 0
[2024-03-14 17:50:34.506] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- admin@10.201.171.35 execute: ls /proc/10239
[2024-03-14 17:50:34.590] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- exited code 2, error output:
[2024-03-14 17:50:34.590] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ls: cannot access /proc/10239: No such file or directory
[2024-03-14 17:50:34.590] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG]
[2024-03-14 17:50:34.591] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] ---- 10.201.171.35 ocp-server is not running
[2024-03-14 17:50:35.625] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - admin@10.201.172.14 execute: sudo chown -R root: /disk/nvme1n1/ocp
[2024-03-14 17:50:36.712] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - exited code 0
[2024-03-14 17:50:36.713] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - admin@10.201.164.19 execute: sudo chown -R root: /disk/nvme1n1/ocp
[2024-03-14 17:50:36.822] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - exited code 0
[2024-03-14 17:50:36.823] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - admin@10.201.171.35 execute: sudo chown -R root: /disk/nvme1n1/ocp
[2024-03-14 17:50:36.930] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - exited code 0
[2024-03-14 17:50:36.931] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - sub restart ref count to 0
[2024-03-14 17:50:36.932] [0d6ce770-e1e8-11ee-9ee6-1070fdd09f5a] [DEBUG] - export restart,这个怎么处理呢
你的环境有点乱。如果可以的话,建议铲掉重来。
清理环境包含:
- 杀掉进程
observer
和obproxy
- 清理掉 安装的软件目录
- 清理到数据文件和日志文件目录。
OB 社区版部署的方法途径现在有很多,方便不同喜好的人选择。
重新部署的时候建议说明你部署参考的文档方法,关键配置文件和页面截图、部署的机器资源(内存和空间)。只有信息都说全了,你的问题别人才能给出有效的建议。
另外,如果你选用 obd 软件部署,我说安装在admin用户下是指配置文件里有个用户指定为 admin。并不是要求 运行 obd 的时候也在 admin下。当然在admin用户下运行 obd也行。只是有的人在root 和 admin 用户下都用过obd 部署,最后自己容易混淆。因为这两个obd 运行时产生的配置文件是不同的。
这个我知道,好的,谢谢