KaLaMi
#1
请问一下使用systeml安装部署报错
我们只装了
obclient-2.2.6-1.el7.x86_64.rpm 客户端工具
ob-deploy-2.10.1-1.el7.x86_64.rpm 数据库其他依赖
oceanbase-ce-4.3.4.0-100000162024110717.el7.x86_64.rpm 数据库核心服务
oceanbase-ce-libs-4.3.4.0-100000162024110717.el7.x86_64.rpm
这几个rpm包,会不会是没有装obagent报的错误
1 个赞
KaLaMi
#4
我们这边不能使用obdweb,只能使用后台部署,使用官方的一键安装包,由于里面的其他组件太多了,我们就选择了system部署。但是这种部署方式我们这边启停一直不稳定
1 个赞
辞霜
#5
使用obd黑屏化通过yaml文件 deploy部署也很方便的。或者你可以尝试将该集群纳入obd接管可以参考obd官网命令。
当前是集群部署失败问题么
这个id是自动生成的?
1 个赞
KaLaMi
#6
对,我们没部集群,只需要单击部署,参考的https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001574328 使用 systemd 部署 OceanBase 数据库。现在是第一次部署成功后服务没问题。但是重启后有时就起不起来,使用 /home/admin/oceanbase/bin/obshell task show -i 2323226544102886150 -d这个命令查看就报的上面的错误
1 个赞
玉楼
#7
麻烦执行下/home/admin/oceanbase/bin/obshell task show -i 22130706433028868
1 个赞
阿绿
#8
看起来是因为前序的任务未正常结束,导致后面的任务无法正确创建执行。
zhe’t这条日志说明了前序任务的ID,可以执行下 task show -i {id} -d 看下任务在哪里出错了
1 个赞
KaLaMi
#9
Check and start obshell daemon
Get task 221307064330288650 detail
id: 221307064330288650
dag_id: 50
name: Start OB
stage: 6
max_stage: 6
state: SUCCEED
operator: RUN
start_time: 2024-12-12T18:19:41.807652002+08:00
end_time: 2024-12-12T18:20:35.002911836+08:00
nodes:
id: 2213070643302886274
node_id: 274
name: Inform all agents to start observer
state: SUCCEED
operator: RUN
start_time: 2024-12-12T18:19:41.88148606+08:00
end_time: 2024-12-12T18:19:43.187751865+08:00
subtasks:
id: 2213070643302886274
task_id: 274
name: Inform all agents to start observer
state: SUCCEED
operator: RUN
start_time: 2024-12-12T18:19:42.183444874+08:00
end_time: 2024-12-12T18:19:42.635433244+08:00
execute_times: 1
execute_agent:
ip: 127.0.0.1
port: 2886
task_logs:
- Inform 127.0.0.1:2886 to create the task
- create task 221307064330288651 successfully
id: 2213070643302886275
node_id: 275
name: Make sure all agents are ready
state: SUCCEED
operator: RUN
start_time: 2024-12-12T18:19:44.719096206+08:00
end_time: 2024-12-12T18:19:46.187899301+08:00
subtasks:
id: 2213070643302886275
task_id: 275
name: Make sure all agents are ready
state: SUCCEED
operator: RUN
start_time: 2024-12-12T18:19:45.264503294+08:00
end_time: 2024-12-12T18:19:45.686328377+08:00
execute_times: 1
execute_agent:
ip: 127.0.0.1
port: 2886
task_logs:
- All agents have created the tasks successfully
- Check if all agents can be advanced
- 127.0.0.1:2886 is ready
id: 2213070643302886276
node_id: 276
name: Advance agents to execute the task
state: SUCCEED
operator: RUN
start_time: 2024-12-12T18:19:47.514440056+08:00
end_time: 2024-12-12T18:19:48.776062483+08:00
subtasks:
id: 2213070643302886276
task_id: 276
name: Advance agents to execute the task
state: SUCCEED
operator: RUN
start_time: 2024-12-12T18:19:47.772314755+08:00
end_time: 2024-12-12T18:19:48.160015041+08:00
execute_times: 1
execute_agent:
ip: 127.0.0.1
port: 2886
task_logs:
- advance 127.0.0.1:2886 to execute the task
id: 2213070643302886277
node_id: 277
name: Wait for all agents to execute tasks successfully
state: SUCCEED
operator: RUN
start_time: 2024-12-12T18:19:50.45476449+08:00
end_time: 2024-12-12T18:20:27.694044476+08:00
subtasks:
id: 2213070643302886277
task_id: 277
name: Wait for all agents to execute tasks successfully
state: SUCCEED
operator: RUN
start_time: 2024-12-12T18:19:51.281340682+08:00
end_time: 2024-12-12T18:20:27.143428804+08:00
execute_times: 1
execute_agent:
ip: 127.0.0.1
port: 2886
task_logs:
- Wait for task to succeed
- ‘Start preparations: wait for notification to start task’
- ‘Start preparations: wait operator to advance’
- ‘Start preparations: start to execute the task’
- ‘Check ob process and config: check if first start’
- ‘Check ob process and config: check observer process config’
- ‘Start observer: start observer’
- ‘Start observer: check if first start’
- ‘Start observer: not first start, skip require check’
- ‘Start observer: generate start cmd’
- ‘Start observer: start cmd: export LD_LIBRARY_PATH=’’/opt/oceanbase/lib’’; /opt/oceanbase/bin/observer ’
- ‘Start observer: update self OB port’
- ‘Start observer: update OB port in all_agent’
- ‘Execute start observer sql: exec start server sql’
- ‘Wait for task to end: wait operator to advance’
- 221307064330288651 succeed
id: 2213070643302886278
node_id: 278
name: Start Zone
state: SUCCEED
operator: RUN
start_time: 2024-12-12T18:20:29.053264661+08:00
end_time: 2024-12-12T18:20:31.589788851+08:00
subtasks:
id: 2213070643302886278
task_id: 278
name: Start Zone
state: SUCCEED
operator: RUN
start_time: 2024-12-12T18:20:29.570152589+08:00
end_time: 2024-12-12T18:20:30.117528161+08:00
execute_times: 1
execute_agent:
ip: 127.0.0.1
port: 2886
task_logs:
- start zone
- zone1 started
id: 2213070643302886279
node_id: 279
name: Inform all agents to end the task
state: SUCCEED
operator: RUN
start_time: 2024-12-12T18:20:33.08989749+08:00
end_time: 2024-12-12T18:20:34.813365275+08:00
subtasks:
id: 2213070643302886279
task_id: 279
name: Inform all agents to end the task
state: SUCCEED
operator: RUN
start_time: 2024-12-12T18:20:33.579522042+08:00
end_time: 2024-12-12T18:20:34.423986904+08:00
execute_times: 1
execute_agent:
ip: 127.0.0.1
port: 2886
task_logs:
- ‘Check the final result: SUCCEED!’
- End all tasks
- 127.0.0.1:2886 end the task
1 个赞
玉楼
#13
问题基本定位,在重启中连续多次触发运维导致出现活锁。而systemctl命令无交互式的中断导致解锁交互被自动放弃。
您可以先通过执行过 /home/admin/oceanbase/bin/obshell cluster start
命令恢复服务。
这个命令中会打印一个warn,这个就是交互式申请自动解锁的请求,输入y确认即可拉起服务
KaLaMi
#16
[root@localhost oceanbase]# /opt/oceanbase/bin/obshell task show -i 221307064330288692 -d
Check and start obshell daemon
Get task 221307064330288692 detail
id: 221307064330288692
dag_id: 92
name: Stop OB
stage: 5
max_stage: 5
state: SUCCEED
operator: RUN
start_time: 2024-12-13T10:04:28.964624911+08:00
end_time: 2024-12-13T10:04:49.141429656+08:00
nodes:
id: 2213070643302886484
node_id: 484
name: Inform all agents to prepare to stop observer
state: SUCCEED
operator: RUN
start_time: 2024-12-13T10:04:29.078846345+08:00
end_time: 2024-12-13T10:04:30.3649079+08:00
subtasks:
id: 2213070643302886484
task_id: 484
name: Inform all agents to prepare to stop observer
state: SUCCEED
operator: RUN
start_time: 2024-12-13T10:04:29.356470735+08:00
end_time: 2024-12-13T10:04:29.878897304+08:00
execute_times: 1
execute_agent:
ip: 127.0.0.1
port: 2886
task_logs:
- Inform 127.0.0.1:2886 to create the task
- create task 221307064330288693 successfully
id: 2213070643302886485
node_id: 485
name: Make sure all agents are ready
state: SUCCEED
operator: RUN
start_time: 2024-12-13T10:04:32.139618024+08:00
end_time: 2024-12-13T10:04:33.661376445+08:00
subtasks:
id: 2213070643302886485
task_id: 485
name: Make sure all agents are ready
state: SUCCEED
operator: RUN
start_time: 2024-12-13T10:04:32.658253424+08:00
end_time: 2024-12-13T10:04:33.062329821+08:00
execute_times: 1
execute_agent:
ip: 127.0.0.1
port: 2886
task_logs:
- All agents have created the tasks successfully
- Check if all agents can be advanced
- 127.0.0.1:2886 is ready
id: 2213070643302886486
node_id: 486
name: Advance agents to execute the task
state: SUCCEED
operator: RUN
start_time: 2024-12-13T10:04:34.881235749+08:00
end_time: 2024-12-13T10:04:36.128598301+08:00
subtasks:
id: 2213070643302886486
task_id: 486
name: Advance agents to execute the task
state: SUCCEED
operator: RUN
start_time: 2024-12-13T10:04:35.123924983+08:00
end_time: 2024-12-13T10:04:35.631902605+08:00
execute_times: 1
execute_agent:
ip: 127.0.0.1
port: 2886
task_logs:
- advance 127.0.0.1:2886 to execute the task
id: 2213070643302886487
node_id: 487
name: Wait for all agents to execute tasks successfully
state: SUCCEED
operator: RUN
start_time: 2024-12-13T10:04:37.710991166+08:00
end_time: 2024-12-13T10:04:46.345666435+08:00
subtasks:
id: 2213070643302886487
task_id: 487
name: Wait for all agents to execute tasks successfully
state: SUCCEED
operator: RUN
start_time: 2024-12-13T10:04:38.173847518+08:00
end_time: 2024-12-13T10:04:45.190166047+08:00
execute_times: 1
execute_agent:
ip: 127.0.0.1
port: 2886
task_logs:
- Wait for task to succeed
- ‘Start preparations: wait for notification to start task’
- ‘Start preparations: wait operator to advance’
- ‘Start preparations: start to execute the task’
- ‘Stop observer: Get observer Pid’
- ‘Stop observer: Observer is not running’
- ‘Wait for task to end: wait operator to advance’
- 221307064330288693 succeed
id: 2213070643302886488
node_id: 488
name: Inform all agents to end the task
state: SUCCEED
operator: RUN
start_time: 2024-12-13T10:04:47.700001577+08:00
end_time: 2024-12-13T10:04:48.986549789+08:00
subtasks:
id: 2213070643302886488
task_id: 488
name: Inform all agents to end the task
state: SUCCEED
operator: RUN
start_time: 2024-12-13T10:04:47.9819838+08:00
end_time: 2024-12-13T10:04:48.59335619+08:00
execute_times: 1
execute_agent:
ip: 127.0.0.1
port: 2886
task_logs:
- ‘Check the final result: SUCCEED!’
- End all tasks
- 127.0.0.1:2886 end the task
玉楼
#17
不好意思,让你久等。我们已经排查清楚了,在一些操作系统上reboot的时候会调用service stop,但没有等待service stop返回。这个逻辑会导致后续启动的时候检查到之前的一个stop没有完成,最后形成了活锁。
这里反复的活锁导致了observer的PID文件被写错了。
当前绕过方法是将正确的observer PID写入 /home/admin/oceanbase/run/observer.pid,然后执行stop。
另外因为这个操作系统的行为,Service在下次重启依旧会陷入这个活锁,因此可以关掉这个Service,改用/home/admin/oceanbase/bin/obshell去管理。
我们将在下一个迭代中修复这个问题,感谢你的反馈。
2 个赞
玉楼
#19
嗯嗯,我们将在明天(12月20号)发布的4.3.4 BP1中修复这个问题
1 个赞