【 使用环境 测试环境
1台中心节点部署 obd ,obproxy ,内存 16G ;3台 observer 虚拟机 内存 32G ,observer 主机上同时部署了 agent 。 /data 200G 使用率 76% /redo 100G 使用率 91% ,IO 200MB/S
observer 社区版 4.2
【 OB or 其他组件 】
【 使用版本 】
【问题描述】清晰明确描述问题
创建 tenant 很慢 。obd cluster stop 集群后,obd cluster start 卡在 wait for observer init 阶段
查看 observer 日志
header 6: address=0x2b4303513b40
[2023-09-19 15:24:31.177527] ERROR detect_data_disk_io_failure_ (ob_failure_detector.cpp:385) [3005][T1_Occam][T1][Y0-0000000000000000-0-0] [lt=0][errcode=-4392] disk is hung(msg=“data disk may be hung, add failure event”, data_disk_io_hang_event={type:PROCESS HANG, module:STORAGE, info:data disk io hang event, level:FATAL}, data_disk_error_start_ts=1695108271037974)
[2023-09-19 15:24:31.791380] ERROR inner_aio (ob_io_manager.cpp:770) [2883][SvrStartupHandl][T1][Y0-0000000000000000-0-0] [lt=0][errcode=-4392] disk is hung(msg=“data disk has fatal error”)
[2023-09-19 15:24:31.909988] ERROR inner_aio (ob_io_manager.cpp:770) [2882][SvrStartupHandl][T1][Y0-0000000000000000-0-0] [lt=0][errcode=-4392] disk is hung(msg=“data disk has fatal error”)
[2023-09-19 15:24:31.926457] ERROR inner_aio (ob_io_manager.cpp:770) [2884][SvrStartupHandl][T1][Y0-0000000000000000-0-0] [lt=0][errcode=-4392] disk is hung(msg=“data disk has fatal error”)
[2023-09-19 15:24:33.329114] ERROR inner_aio (ob_io_manager.cpp:770) [2762][observer][T1][Y0-0000000000000000-0-0] [lt=0][errcode=-4392] disk is hung(msg=“data disk has fatal error”)
【复现路径】问题出现前后相关操作
【问题现象及影响】
集群不能正常启动,查看了/var/log/message 没有磁盘报错,dd 检查 磁盘 io在 260MB/s .
【附件】