grafana 监控页面报错

阿绿 · 2023 年2 月 2 日 16:55

要用一样的格式，不要省略name，写完整name和ip

先华为后天 · 2023 年2 月 3 日 09:30

这个方式也不对有2个问题
1、在线集群上edit-config 修改配置后按提示处理把集群搞没了

--- edit-config修改
obagent:
  depends:
  - oceanbase-ce
  servers:
  - name: ob11
    ip: 10.125.144.16
  - name: ob12
    ip: 10.125.144.16
  - name: ob21
    ip: 10.125.144.17
  - name: ob22
    ip: 10.125.144.17
  - name: ob31
    ip: 10.125.144.18
  - name: ob32
    ip: 10.125.144.18
  global:
    home_path: /data/oceanbase/obagent
    server_port: 8088
    pprof_port: 8089
    log_level: INFO
    log_path: log/monagent.log
    crypto_method: plain
"/tmp/tmpakazphqn.yaml" 149L, 4227C written

---  修改完后提示要redeploy
Modifications to the deployment architecture take effect after you redeploy the architecture. Are you sure that you want to start a redeployment?  [y/n]: y
Search param plugin and load ok
Parameter check ok
Save deploy "ob400" configuration
Use `obd cluster redeploy ob400` to make changes take effect.

---  redeploy 导致集群被destroy后又部署的，原以为这个redeploy会进行差异化处理，只处理二进制文件，不会动库。 redeploy过程 obagent目录未清理导致报错
[root@tgypt-xx13d002-cs76w ~]# obd cluster redeploy ob400
Get local repositories ok
Search plugins ok
Open ssh connection ok
Stop observer ok
Stop obproxy ok
Stop obagent ok
Stop prometheus ok
Stop grafana ok
ob400 stopped
Search plugins ok
Cluster status check ok
observer work dir cleaning ok
obproxy work dir cleaning ok
obagent work dir cleaning ok
prometheus work dir cleaning ok
grafana work dir cleaning ok
ob400 destroyed
install oceanbase-ce-4.0.0.0 for local ok
install obproxy-ce-4.0.0 for local ok
install obagent-1.2.0 for local ok
install prometheus-2.37.1 for local ok
install grafana-7.5.17 for local ok
+--------------------------------------------------------------------------------------------+
|                                          Packages                                          |
+--------------+---------+------------------------+------------------------------------------+
| Repository   | Version | Release                | Md5                                      |
+--------------+---------+------------------------+------------------------------------------+
| oceanbase-ce | 4.0.0.0 | 103000022023011215.el7 | d0ecd5a759c337e044ec79cab8b52bbf5a918fbb |
| obproxy-ce   | 4.0.0   | 5.el7                  | ac0a815bcad9cff0d2ab2b0e89ea73defd9111de |
| obagent      | 1.2.0   | 4.el7                  | c8872e2a1cfefa99a4bb75757a08d6fc2e28e25e |
| prometheus   | 2.37.1  | 10000102022110211.el7  | 2d856b4a90e7a35322bc3412231f714fde3fd794 |
| grafana      | 7.5.17  | 1                      | 5129b0134e31d273c970a7e3c7370990016bee16 |
+--------------+---------+------------------------+------------------------------------------+
Repository integrity check ok
Parameter check ok
Open ssh connection ok
Cluster status check ok
Initializes observer work home ok
Initializes obproxy work home ok
Initializes obagent work home x
[ERROR] OBD-1002: Fail to init ob12(10.125.144.16) home path: /data/oceanbase/obagent is not empty.
[ERROR] OBD-1002: Fail to init ob22(10.125.144.17) home path: /data/oceanbase/obagent is not empty.
[ERROR] OBD-1002: Fail to init ob32(10.125.144.18) home path: /data/oceanbase/obagent is not empty.

Initializes prometheus work home ok
Initializes grafana work home ok
See https://www.oceanbase.com/product/ob-deployer/error-codes .
[root@tgypt-xx13d002-cs76w ~]# rm -rI  /data/oceanbase/obagent/*
rm: remove 5 arguments recursively? y
[root@tgypt-xx13d002-cs76w ~]# obd cluster redeploy ob400
Get local repositories ok
[ERROR] Deploy "ob400" is destroyed. You could not destroy an undeployed cluster
See https://www.oceanbase.com/product/ob-deployer/error-codes .
[root@tgypt-xx13d002-cs76w ~]# 
[root@tgypt-xx13d002-cs76w ~]# obd cluster display ob400
Deploy "ob400" is destroyed
See https://www.oceanbase.com/product/ob-deployer/error-codes .
[root@tgypt-xx13d002-cs76w ~]# 
[root@tgypt-xx13d002-cs76w ~]# du -sh /data/oceanbase/
68K     /data/oceanbase/
[root@tgypt-xx13d002-cs76w ~]#

2、重新手动部署，仍然报agent目录为空虽然已经把他的上层目录都清空了

3、obd工具除了功能差外，处理上还不够严谨，比如obagent server部分前后2次格式调整有明显差异，如果有一个是不对的，obd根本没识别出来

玉楼 · 2023 年2 月 3 日 11:01

你这一台机器上有两个agent，却只有一个global里有home_path。意味着ob11和ob12这两个agent都在用10.125.144.16上/data/oceanbase/obagent这个目录。在deploy阶段，ob11先执行，把/data/oceanbase/obagent占了，然后ob12去检查的时候发现/data/oceanbase/obagent 被占用了，就会报错。
把global里的home_path删掉，像配置ob节点一个，为每个agent节点配置home_path。例如

obagent:
  depends:
  - oceanbase-ce
  servers:
  - name: ob11
    ip: 10.125.144.16
  - name: ob12
    ip: 10.125.144.16
  - name: ob21
    ip: 10.125.144.17
  - name: ob22
    ip: 10.125.144.17
  - name: ob31
    ip: 10.125.144.18
  - name: ob32
    ip: 10.125.144.18
  ob11:
    home_path: /data/oceanbase/obagent1
  ob12:
    home_path: /data/oceanbase/obagent2
  global:
    server_port: 8088
    pprof_port: 8089

这样的形式，为每个agent配置独立的工作目录。
home_path是一个特殊的配置项，一个机器上部署多个节点的时候不能统一使用global进行配置。

先华为后天 · 2023 年2 月 3 日 11:14

ok了，非常感谢