obd demo 启动报错

【 使用环境 】测试环境
【 OB 】
【 使用版本 】oceanbase 4.3.5
【问题描述】obd demo 后 执行到obshell bootstrap 报连接超时
【复现路径】执行 obd demo
【附件及日志】[root@localhost bin]# obd demo
Package obproxy-ce-4.3.5.0-3.el7 is available.
Package oceanbase-ce-4.3.5.3-103000092025080818.el7 is available.
Package grafana-7.5.17-1 is available.
Package prometheus-2.37.1-10000102022110211.el7 is available.
Package obagent-4.2.2-100000042024011120.el7 is available.
install obproxy-ce-4.3.5.0 for local ok
install oceanbase-ce-4.3.5.3 for local ok
install grafana-7.5.17 for local ok
install prometheus-2.37.1 for local ok
install obagent-4.2.2 for local ok
Cluster param config check ok
Open ssh connection ok
Generate obproxy configuration ok
Generate grafana configuration ok
Generate prometheus configuration ok
Generate obagent configuration ok
±-------------------------------------------------------------------------------------------+
| Packages |
±-------------±--------±-----------------------±-----------------------------------------+
| Repository | Version | Release | Md5 |
±-------------±--------±-----------------------±-----------------------------------------+
| obproxy-ce | 4.3.5.0 | 3.el7 | f17b277b681adb1c86bfc3cfda369ad88896da9d |
| oceanbase-ce | 4.3.5.3 | 103000092025080818.el7 | 8120a146d35cd47a9289d91990c8b44a8c21675d |
| grafana | 7.5.17 | 1 | 1bf1f338d3a3445d8599dc6902e7aeed4de4e0d6 |
| prometheus | 2.37.1 | 10000102022110211.el7 | 58913c7606f05feb01bc1c6410346e5fc31cf263 |
| obagent | 4.2.2 | 100000042024011120.el7 | 19739a07a12eab736aff86ecf357b1ae660b554e |
±-------------±--------±-----------------------±-----------------------------------------+
Repository integrity check ok
Load param plugin ok
Open ssh connection ok
Initializes obagent work home ok
Initializes observer work home ok
Initializes obproxy work home ok
Initializes prometheus work home ok
Initializes grafana work home ok
Parameter check ok
Remote obproxy-ce-4.3.5.0-3.el7-f17b277b681adb1c86bfc3cfda369ad88896da9d repository install ok
Remote obproxy-ce-4.3.5.0-3.el7-f17b277b681adb1c86bfc3cfda369ad88896da9d repository lib check ok
Remote oceanbase-ce-4.3.5.3-103000092025080818.el7-8120a146d35cd47a9289d91990c8b44a8c21675d repository install ok
Remote oceanbase-ce-4.3.5.3-103000092025080818.el7-8120a146d35cd47a9289d91990c8b44a8c21675d repository lib check ok
Remote grafana-7.5.17-1-1bf1f338d3a3445d8599dc6902e7aeed4de4e0d6 repository install ok
Remote grafana-7.5.17-1-1bf1f338d3a3445d8599dc6902e7aeed4de4e0d6 repository lib check ok
Remote prometheus-2.37.1-10000102022110211.el7-58913c7606f05feb01bc1c6410346e5fc31cf263 repository install ok
Remote prometheus-2.37.1-10000102022110211.el7-58913c7606f05feb01bc1c6410346e5fc31cf263 repository lib check ok
Remote obagent-4.2.2-100000042024011120.el7-19739a07a12eab736aff86ecf357b1ae660b554e repository install ok
Remote obagent-4.2.2-100000042024011120.el7-19739a07a12eab736aff86ecf357b1ae660b554e repository lib check ok
demo deployed
Get local repositories ok
Load cluster param plugin ok
Open ssh connection ok
[WARN] OBD-1012: (127.0.0.1) clog and data use the same disk (/)
Check before start obagent ok
Check before start prometheus ok
Check before start grafana ok
cluster scenario: express_oltp
Start observer ok
observer program health check ok
Connect to observer 127.0.0.1:2881 ok
oceanbase bootstrap ok
obshell start ok
obshell program health check ok
obshell bootstrap x
[ERROR] obshell bootstrap failed: Task ‘Take over’ execution failed: 127.0.0.1:2886 ERROR: [MySQL.Error]: MySQL error: Error 4012 (HY000): Timeout, query has reached the maximum query timeout: 1000000000(us), maybe you can adjust the session variable ob_query_timeout or query_timeout hint, and try again

【备注】基于 LLM 和开源文档 RAG 的论坛小助手已开放测试,在发帖时输入 [@论坛小助手] 即可召唤小助手,欢迎试用!

2 个赞
  • 数据目录、clog 事务日志目录物理磁盘分离,杜绝根盘混布;
  • 降低 prometheus、grafana 轮询采集频率,减少后台 IO 消耗;
  • 单机测试环境适当调大 observer 内存参数,减少磁盘刷写频次。

把memory_size调大一下试试,通过~/.obd/cluster/demo/demo.yaml修改。
再执行obd deploy demo试试

obd demo 部署了几个节点呢,一个节点不应该出现这种情况呀!看看时间同步情况!

. 根因 1:clog/data 同盘,单盘 IO 瓶颈(日志明确告警,最高概率)

OceanBase clog 是强同步事务日志,大量随机写;data 目录是基线数据、转储、合并落盘。demo 单节点共用 / 系统盘,磁盘 IO 打满,obshell 执行集群元数据接管、查询内部系统表时,SQL 长期阻塞,触发全局查询超时。

2. 根因 2:单机资源不足(内存 / CPU)

obd demo 默认低配参数,若机器内存≤8G,OB 内存池不足,转储、合并后台线程抢占资源,阻塞 obshell 运维 SQL。

3. 根因 3:obshell 启动时序 / 内部元数据锁冲突

observer 刚完成 bootstrap,内部租户、资源单元、系统视图正在初始化,obshell 并发接管查询抢锁,长时间等待元数据就绪。

4. 根因 4:系统内核 / 文件系统参数未调优(EL7 默认参数不满足 OB)

未关闭 atime、脏页刷写阈值不合理,加剧磁盘阻塞。

1 个赞

就按报错调整相关参数就可以了吧

检查日志,根据日志报错排查
grep “ERROR” observer.log