【 使用环境 】 测试环境
【 OB or 其他组件 】ocp-express
【 使用版本 】v4.1.0.0
【问题描述】ocp-express起不来
【复现路径】 多次stop和start 也不行
【问题现象及影响】
sysbench卡死了,发现有个节点磁盘占用率超95%,observer.log疯狂报错磁盘空间不够,删了一些内容磁盘降下来了,observer.log还是报磁盘空间不够,就吧集群stop再start 一下,start的时候ocp-express起不来 报错如下:
bootstrap (2).log (84.3 KB)
【附件】
可以确认下 observer 是正常的吗,或者启动的时候先指定一下组件, obd cluster start $deploy_name -c oceanbase-ce, 起来确定ocp的meta租户可以连接了,再启动ocp-express
1 个赞
补充一下,可以连接不一定是ready的,连接后执行一下 SQL,看看是否正常。
可以select 正常
obclient [(none)]> select user,host from mysql.user;
±------------±-----+
| user | host |
±------------±-----+
| root | % |
| proxyro | % |
| ocp_monitor | % |
±------------±-----+
这是启动报错,
[2023-03-29 15:05:58.233] [DEBUG] - Found for ocp-express-param-1.0 for ocp-express-1.0.0
[2023-03-29 15:05:58.233] [DEBUG] - Applying ocp-express-param-1.0 for ocp-express-1.0.0-100000432023032015.el7-42c6fc921063f24f9e1072d75bfa7f21f42146e3
[2023-03-29 15:05:58.558] [DEBUG] - Call ocp-express-py_script_start_check-1.0 for ocp-express-1.0.0-100000432023032015.el7-42c6fc921063f24f9e1072d75bfa7f21f42146e3
[2023-03-29 15:05:58.558] [DEBUG] - import start_check
[2023-03-29 15:05:58.567] [DEBUG] - add start_check ref count to 1
[2023-03-29 15:05:58.568] [INFO] Check before start ocp-express
[2023-03-29 15:05:58.578] [DEBUG] – root@192.168.2.215 execute: cat /home/pirate/programs//test/ocpexpress/run/ocp-express.pid
[2023-03-29 15:05:58.651] [DEBUG] – exited code 0
[2023-03-29 15:05:58.652] [DEBUG] – root@192.168.2.215 execute: ls /proc/22456
[2023-03-29 15:05:58.777] [DEBUG] – exited code 2, error output:
[2023-03-29 15:05:58.777] [DEBUG] ls: 无法访问/proc/22456: 没有那个文件或目录
[2023-03-29 15:05:58.777] [DEBUG]
[2023-03-29 15:05:58.778] [DEBUG] – root@192.168.2.215 execute: bash -c ‘cat /proc/net/{udp*,tcp*}’ | awk -F’ ’ ‘{print $2,$10}’ | grep ‘00000000:1FF4’ | awk -F’ ’ ‘{print $2}’ | uniq
[2023-03-29 15:05:58.913] [DEBUG] – exited code 0
[2023-03-29 15:05:58.914] [DEBUG] – root@192.168.2.215 execute: java -version
[2023-03-29 15:05:59.154] [DEBUG] – exited code 0
[2023-03-29 15:05:59.155] [DEBUG] – root@192.168.2.215 execute: cat /proc/meminfo
[2023-03-29 15:05:59.285] [DEBUG] – exited code 0
[2023-03-29 15:05:59.287] [DEBUG] – root@192.168.2.215 execute: df --block-size=1024
[2023-03-29 15:05:59.417] [DEBUG] – exited code 0
[2023-03-29 15:05:59.419] [DEBUG] – get disk info for path /, total: 53660876800 avail: 43060412416
[2023-03-29 15:05:59.419] [DEBUG] – get disk info for path /dev, total: 8311586816 avail: 8311586816
[2023-03-29 15:05:59.419] [DEBUG] – get disk info for path /dev/shm, total: 8321503232 avail: 8321503232
[2023-03-29 15:05:59.419] [DEBUG] – get disk info for path /run, total: 8321503232 avail: 7775461376
[2023-03-29 15:05:59.420] [DEBUG] – get disk info for path /sys/fs/cgroup, total: 8321503232 avail: 8321503232
[2023-03-29 15:05:59.420] [DEBUG] – get disk info for path /home, total: 237147537408 avail: 84512223232
[2023-03-29 15:05:59.420] [DEBUG] – get disk info for path /boot, total: 520794112 avail: 393154560
[2023-03-29 15:05:59.420] [DEBUG] – get disk info for path /run/user/0, total: 1664303104 avail: 1664303104
[2023-03-29 15:05:59.420] [DEBUG] – root@192.168.2.215 execute: df --block-size=1024 /home/pirate/programs//test/ocpexpress/log
[2023-03-29 15:05:59.548] [DEBUG] – exited code 0
[2023-03-29 15:05:59.548] [DEBUG] – get disk info for path /home, total: 237147537408 avail: 84512223232
[2023-03-29 15:05:59.624] [DEBUG] - sub start_check ref count to 0
[2023-03-29 15:05:59.624] [DEBUG] - export start_check
[2023-03-29 15:05:59.624] [DEBUG] - Call ocp-express-py_script_start-1.0 for ocp-express-1.0.0-100000432023032015.el7-42c6fc921063f24f9e1072d75bfa7f21f42146e3
[2023-03-29 15:05:59.624] [DEBUG] - import start
[2023-03-29 15:05:59.862] [DEBUG] - add start ref count to 1
[2023-03-29 15:05:59.893] [INFO] Start ocp-express
[2023-03-29 15:05:59.894] [DEBUG] – root@192.168.2.215 execute: cat /home/pirate/programs//test/ocpexpress/run/ocp-express.pid
[2023-03-29 15:05:59.985] [DEBUG] – exited code 0
[2023-03-29 15:05:59.985] [DEBUG] – root@192.168.2.215 execute: ls /proc/22456
[2023-03-29 15:06:00.116] [DEBUG] – exited code 2, error output:
[2023-03-29 15:06:00.116] [DEBUG] ls: 无法访问/proc/22456: 没有那个文件或目录
[2023-03-29 15:06:00.116] [DEBUG]
[2023-03-29 15:06:00.117] [DEBUG] – connect 192.168.2.215 -P2883 -umeta@ocp -poceanbase
[2023-03-29 15:06:00.122] [DEBUG] – root@192.168.2.215 execute: cd /home/pirate/programs//test/ocpexpress; bash -c ‘java -jar -Xms436m -Xmx436m -DJDBC_URL=jdbc:oceanbase://192.168.2.215:2883/ocp_express -DJDBC_USERNAME=meta@ocp -DJDBC_PASSWORD=oceanbase -DPUBLIC_KEY= -Docp.iam.encrypted-system-password=oceanbase /home/pirate/programs//test/ocpexpress/lib/ocp-express-server.jar --port=8180 --bootstrap --progress-log=/home/pirate/programs//test/ocpexpress/log/bootstrap.log --with-property=logging.file.max-size:100MB --with-property=logging.file.total-size-cap:1GB --with-property=logging.file.name:/home/pirate/programs//test/ocpexpress/log/ocp-express.log > /dev/null 2>&1 &’
[2023-03-29 15:06:00.247] [DEBUG] – exited code 0
[2023-03-29 15:06:00.248] [DEBUG] – root@192.168.2.215 execute: ps -aux | grep ‘java -jar -Xms436m -Xmx436m -DJDBC_URL=jdbc:oceanbase://192.168.2.215:2883/ocp_express -DJDBC_USERNAME=meta@ocp -DJDBC_PASSWORD=oceanbase -DPUBLIC_KEY= -Docp.iam.encrypted-system-password=oceanbase /home/pirate/programs//test/ocpexpress/lib/ocp-express-server.jar --port=8180 --bootstrap --progress-log=/home/pirate/programs//test/ocpexpress/log/bootstrap.log --with-property=logging.file.max-size:100MB --with-property=logging.file.total-size-cap:1GB --with-property=logging.file.name:/home/pirate/programs//test/ocpexpress/log/ocp-express.log’ | grep -v grep | awk ‘{print $2}’
[2023-03-29 15:06:00.433] [DEBUG] – exited code 0
[2023-03-29 15:06:00.434] [DEBUG] – write 23310 to root@192.168.2.215:22: /home/pirate/programs//test/ocpexpress/run/ocp-express.pid
[2023-03-29 15:06:00.657] [DEBUG] - current remote_transporter RemoteTransporter.CLIENT
[2023-03-29 15:06:00.658] [DEBUG] – root@192.168.2.215 execute: mkdir -p /home/pirate/programs//test/ocpexpress/run && rm -fr /home/pirate/programs//test/ocpexpress/run/ocp-express.pid
[2023-03-29 15:06:00.756] [DEBUG] – exited code 0
[2023-03-29 15:06:00.757] [DEBUG] – send /tmp/tmpgr7rru5h to /home/pirate/programs//test/ocpexpress/run/ocp-express.pid
[2023-03-29 15:06:00.801] [DEBUG] - root@192.168.2.215 execute: chmod 600 /home/pirate/programs//test/ocpexpress/run/ocp-express.pid
[2023-03-29 15:06:00.895] [DEBUG] - exited code 0
[2023-03-29 15:06:00.940] [INFO] ocp-express program health check
[2023-03-29 15:06:00.941] [DEBUG] – 192.168.2.215 program health check
[2023-03-29 15:06:00.941] [DEBUG] – root@192.168.2.215 execute: ls /proc/23310
[2023-03-29 15:06:01.034] [DEBUG] – exited code 0
[2023-03-29 15:06:01.035] [DEBUG] – root@192.168.2.215 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{print $2,$10}’ | grep ‘00000000:1FF4’ | awk -F’ ’ ‘{print $2}’ | uniq
[2023-03-29 15:06:01.204] [DEBUG] – exited code 0
[2023-03-29 15:06:01.205] [DEBUG] – failed to start 192.168.2.215 ocp-express, remaining retries: 59
[2023-03-29 15:06:04.208] [DEBUG] – 192.168.2.215 program health check
[2023-03-29 15:06:04.211] [DEBUG] – root@192.168.2.215 execute: ls /proc/23310
[2023-03-29 15:06:04.335] [DEBUG] – exited code 0
[2023-03-29 15:06:04.336] [DEBUG] – root@192.168.2.215 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{print $2,$10}’ | grep ‘00000000:1FF4’ | awk -F’ ’ ‘{print $2}’ | uniq
[2023-03-29 15:06:04.510] [DEBUG] – exited code 0
[2023-03-29 15:06:04.511] [DEBUG] – failed to start 192.168.2.215 ocp-express, remaining retries: 58
[2023-03-29 15:06:07.514] [DEBUG] – 192.168.2.215 program health check
[2023-03-29 15:06:07.514] [DEBUG] – root@192.168.2.215 execute: ls /proc/23310
[2023-03-29 15:06:07.604] [DEBUG] – exited code 0
[2023-03-29 15:06:07.605] [DEBUG] – root@192.168.2.215 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{print $2,$10}’ | grep ‘00000000:1FF4’ | awk -F’ ’ ‘{print $2}’ | uniq
[2023-03-29 15:06:07.746] [DEBUG] – exited code 0
[2023-03-29 15:06:07.746] [DEBUG] – failed to start 192.168.2.215 ocp-express, remaining retries: 57
[2023-03-29 15:06:10.748] [DEBUG] – 192.168.2.215 program health check
[2023-03-29 15:06:10.749] [DEBUG] – root@192.168.2.215 execute: ls /proc/23310
[2023-03-29 15:06:10.838] [DEBUG] – exited code 0
[2023-03-29 15:06:10.838] [DEBUG] – root@192.168.2.215 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{print $2,$10}’ | grep ‘00000000:1FF4’ | awk -F’ ’ ‘{print $2}’ | uniq
[2023-03-29 15:06:10.971] [DEBUG] – exited code 0
[2023-03-29 15:06:10.972] [DEBUG] – failed to start 192.168.2.215 ocp-express, remaining retries: 56
[2023-03-29 15:06:13.974] [DEBUG] – 192.168.2.215 program health check
[2023-03-29 15:06:13.975] [DEBUG] – root@192.168.2.215 execute: ls /proc/23310
[2023-03-29 15:06:14.069] [DEBUG] – exited code 0
[2023-03-29 15:06:14.069] [DEBUG] – root@192.168.2.215 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{print $2,$10}’ | grep ‘00000000:1FF4’ | awk -F’ ’ ‘{print $2}’ | uniq
[2023-03-29 15:06:14.204] [DEBUG] – exited code 0
[2023-03-29 15:06:14.204] [DEBUG] – failed to start 192.168.2.215 ocp-express, remaining retries: 55
[2023-03-29 15:06:17.207] [DEBUG] – 192.168.2.215 program health check
[2023-03-29 15:06:17.207] [DEBUG] – root@192.168.2.215 execute: ls /proc/23310
[2023-03-29 15:06:17.296] [DEBUG] – exited code 0
[2023-03-29 15:06:17.297] [DEBUG] – root@192.168.2.215 execute: bash -c ‘cat /proc/net/{tcp*,udp*}’ | awk -F’ ’ ‘{print $2,$10}’ | grep ‘00000000:1FF4’ | awk -F’ ’ ‘{print $2}’ | uniq
[2023-03-29 15:06:17.431] [DEBUG] – exited code 0
[2023-03-29 15:06:17.432] [DEBUG] – failed to start 192.168.2.215 ocp-express, remaining retries: 54
[2023-03-29 15:06:20.434] [DEBUG] – 192.168.2.215 program health check
[2023-03-29 15:06:20.434] [DEBUG] – root@192.168.2.215 execute: ls /proc/23310
[2023-03-29 15:06:20.524] [DEBUG] – exited code 2, error output:
[2023-03-29 15:06:20.524] [DEBUG] ls: 无法访问/proc/23310: 没有那个文件或目录
[2023-03-29 15:06:20.525] [DEBUG]
[2023-03-29 15:06:20.636] [ERROR] failed to start 192.168.2.215 ocp-express
[2023-03-29 15:06:20.636] [DEBUG] - sub start ref count to 0
obd cluster start test -c oceanbase-ce
正常启动
obd cluster start test -c ocp-express
报错
宁封
2023 年3 月 29 日 16:11
#9
看报错信息,ocp租户登录的时候,是密码不对,使用root@sys修改一下密码试试?
改了密码@sys可以登录,@ocp 还是同样报错。
宁封
2023 年3 月 29 日 16:41
#11
不是,是登录root@sys,改 root@ocp的密码。
如果sys租户能连上,并且可以执行一些sql的话,集群应该是正常的,可以尝试start ocp-express 试一下
-uroot@ocp 空密码可以连,,obd cluster start test -c ocp-express 还是失败,把密码改成和@sys一样也失败。
宁封
2023 年3 月 29 日 19:49
#15
ocp-express使用什么密码连接的metadb,或者密码在哪里配置的?
不知道,ocp-express需要配置连接的metadb的密码吗?最开始白屏部署的,
这个租户默认密码是空的,obd cluster start test -c ocp-express 执行了这个有什么报错吗,可以看一下~/.obd/log/obd 这个日志文件
obd cluster start test -c ocp-express
Get local repositories ok
Search plugins ok
Open ssh connection ok
Load cluster param plugin ok
Check before start ocp-express ok
Start ocp-express ok
[ERROR] failed to start 192.168.2.215 ocp-express
[ERROR] ocp-express start failed
See https://www.oceanbase.com/product/ob-deployer/error-codes .
Trace ID: f0a48cd6-cea8-11ed-9f20-d4ae5296982a
If you want to view detailed obd logs, please run: obd display-trace f0a48cd6-cea8-11ed-9f20-d4ae5296982a
bootstrap (3).log (444.7 KB)
没有ocp-express.log 只有bootstrap.log
/home/pirate/programs/test/ocpexpress/log/bootstrap.log