查看集群
obd cluster display observer
Get local repositories and plugins ok
Open ssh connection ok
Cluster status check ok
[WARN] 127.0.0.1 oceanbase-ce is stopped
查看2881端口
netstat -anp|grep 2881
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 1 0 127.0.0.1:16612 127.0.0.1:2881 CLOSE_WAIT 365313/obclient
【复现路径】执行命令 obd cluster deploy observer -c mini-local-example.yaml ,obd cluster start observer ;本地数据库客户端能正常连接至2881的数据库,但是过一段时间,可能是几分钟或者几十分钟,就无法连接了,到服务器上查看节点已停止,执行obclient -h127.0.0.1 -P2881 -uroot -p’xxxxxxxxx’ -Doceanbase -A 命令也无法连接,提示ERROR 2002 (HY000): Can’t connect to OceanBase server on ‘127.0.0.1’ (115)
【附件及日志】
[2024-11-19 11:10:25.443397] INFO New syslog file info: [address: “127.0.0.1:2882”, observer version: OceanBase_CE 4.3.4.0, revision: 100000162024110717-82547d6edc6ea98ba710e376a736a9a850499a06, sysname: Linux, os release: 5.4.0-1128-gcp, machine: x86_64, tz GMT offset: 08:00]
[2024-11-19 10:13:36.527423] WARN [SERVER] init_local_ip_and_devname (ob_server.cpp:2236) [273702][observer][T0][Y0-0000000000000001-0-0] [lt=81][errcode=-4187] the devname has been rewritten, and the new value comes from local_ip, old value: lo new value: lo local_ip: 127.0.0.1
[2024-11-19 10:13:42.010053] WARN [RPC_OBRPC] decode (ob_rpc_net_handler.cpp:214) [273843][RpcIO][T0][Y0-0000000000000000-0-0] [lt=15][errcode=0] The RPC packet delay is large. [suggestion] check clockdiff and tcp retransmission rate first, it maybe cost by clock skew or network delay. Further more, it may be caused by hardware failure or software failure of the machine
[2024-11-19 10:14:30.322257] WARN [RS] update_fail_count (ob_root_service.cpp:11654) [273782][RSAsyncTask0][T0][YB427F000001-0006273A97712DF0-0-0] [lt=14][errcode=-4752] rootservice start process has failure(msg=“rootservice start()/do_restart() has failure”, ret=-4012, ret=“OB_TIMEOUT”, fail_cnt=1)
OceanBase部署出现xxx lo fail to ping - #8,来自 逆流SD186vsmw_gaNjMwMzYyMDM2LjE3MzEyOTE0MDM._ga_T35KTM57DZ*MTczMTk5NDY0MS4yMi4xLjE3MzE5OTYxMDkuNDkuMC4w
看之前的帖子,部署的时候使用业务ip和对应网卡试试呢,好像用127.0.0.1和lo网卡会有问题
# Please don't use hostname, only IP can be supported
- 127.0.0.1
global:
# The working directory for OceanBase Database. OceanBase Database is started under this directory. This is a required field.
home_path: /home/sysadmin/apps/base/oceanbase/observer
# The directory for data storage. The default value is $home_path/store.
data_dir: /home/sysadmin/apps/base/oceanbase/data
# The directory for clog, ilog, and slog. The default value is the same as the data_dir value.
redo_dir: /home/sysadmin/apps/base/oceanbase/redo
# Starting from observer version 4.2, the network selection for the observer is based on the 'local_ip' parameter, and the 'devname' parameter is no longer mandatory.
# If the 'local_ip' parameter is set, the observer will first use this parameter for the configuration, regardless of the 'devname' parameter.
# If only the 'devname' parameter is set, the observer will use the 'devname' parameter for the configuration.
# If neither the 'devname' nor the 'local_ip' parameters are set, the 'local_ip' parameter will be automatically assigned the IP address configured above.
# devname: eth0
mysql_port: 2881 # External port for OceanBase Database. The default value is 2881. DO NOT change this value after the cluster is started.
rpc_port: 2882 # Internal port for OceanBase Database. The default value is 2882. DO NOT change this value after the cluster is started.
obshell_port: 2886 # Operation and maintenance port for Oceanbase Database. The default value is 2886. This parameter is valid only when the version of oceanbase-ce is 4.2.2.0 or later.
zone: zone1
cluster_id: 1
# please set memory limit to a suitable value which is matching resource.
memory_limit: 6G # The maximum running memory for an observer
system_memory: 1G # The reserved system memory. system_memory is reserved for general tenants. The default value is 30G.
datafile_size: 2G # Size of the data file.
datafile_next: 2G # the auto extend step. Please enter an capacity, such as 2G
datafile_maxsize: 20G # the auto extend max size. Please enter an capacity, such as 20G
log_disk_size: 14G # The size of disk space used by the clog files.
cpu_count: 0
production_mode: false
enable_syslog_wf: true # Print system logs whose levels are higher than WARNING to a separate log file. The default value is true.
max_syslog_file_count: 4 # The maximum number of reserved log files before enabling auto recycling. The default value is 0.
root_password: xxxxxxxxxxxxxxxxxxxxxxxxxxx
我销毁集群再重启
启动后有两个告警
[2024-11-19 15:18:28.107347] WARN [SERVER] init_local_ip_and_devname (ob_server.cpp:2236) [749307][observer][T0][Y0-0000000000000001-0-0] [lt=91][errcode=-4187] the devname has been rewritten, and the new value comes from local_ip, old value: lo new value: lo local_ip: 127.0.0.1
[2024-11-19 15:19:20.790725] WARN [RS] update_fail_count (ob_root_service.cpp:11654) [749387][RSAsyncTask2][T0][YB427F000001-0006273ED9949D42-0-0] [lt=18][errcode=-4752] rootservice start process has failure(msg=“rootservice start()/do_restart() has failure”, ret=-4012, ret=“OB_TIMEOUT”, fail_cnt=1)
数据库是可以正常连接上的,在等几分钟重复断开和连接,测试是不是节点又停止了。
过了十分钟左右,节点又停止了,这次clockdiff有安装,查看observer.log.wf 错误还是之前那两个告警,election.log.wf和rootservice.log.wf都没有数据,搞不懂