【 使用环境 】生产环境
【 OB or 其他组件 】ocp
【 使用版本 】ocp 4.2.2 元数据库版本 4.3.1.0
【问题描述】重启ocp元数据库,sys租户正常启动,ocp元数据库租户无法访问,显示initializing
可以grep ERROR observer.log,在提供一下observer.log日志
怎么重启的?
有日志吗?
errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4184, file=“ob_block_manager.cpp”, line_no=304, info=“Failed to alloc block from io device”)
[2024-08-21 11:49:36.359839] ERROR try_recycle_blocks (palf_env_impl.cpp:786) [400192][T1001_PalfGC][T1001][Y0-0000000000000000-0-0] [lt=0][errcode=-4264] Log out of disk space(msg=“log disk space is almost full”, ret=-4264, total_size(MB)=1024, used_size(MB)=972, used_percent(%)=95, warn_size(MB)=819, warn_percent(%)=80, limit_size(MB)=972, limit_percent(%)=95, total_unrecyclable_size_byte(MB)=908, maximum_used_size(MB)=972, maximum_log_stream=1, oldest_log_stream=1, oldest_scn={val:1723775960462531001, v:0}, in_shrinking=false)
[2024-08-21 11:49:36.378502] ERROR issue_dba_error (ob_log.cpp:1923) [400538][T1002_MINI_MERG][T1002][Y5962C0A80A9D-00061FFFA452E71E-0-0] [lt=1][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4184, file=“ob_block_manager.cpp”, line_no=304, info=“Failed to alloc block from io device”)
[2024-08-21 11:49:36.406128] ERROR issue_dba_error (ob_log.cpp:1923) [401066][T1004_MINI_MERG][T1004][Y5962C0A80A9D-00061FFFA1ADFEE7-0-0] [lt=1][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4184, file=“ob_block_manager.cpp”, line_no=304, info=“Failed to alloc block from io device”)
[2024-08-21 11:49:36.713775] ERROR period_calc_disk_usage (palf_env_impl.cpp:1353) [400194][T1001_LogLoop][T1001][Y0-0000000000000000-0-0] [lt=16][errcode=-4264] Log out of disk space(msg=“log disk space is almost full”, ret=-4264, total_size(MB)=1024, used_size(MB)=972, used_percent(%)=95, warn_size(MB)=819, warn_percent(%)=80, limit_size(MB)=972, limit_percent(%)=95)
[2024-08-21 11:49:36.845069] ERROR try_recycle_blocks (palf_env_impl.cpp:786) [400777][T1003_PalfGC][T1003][Y0-0000000000000000-0-0] [lt=0][errcode=-4264] Log out of disk space(msg=“log disk space is almost full”, ret=-4264, total_size(MB)=1024, used_size(MB)=972, used_percent(%)=95, warn_size(MB)=819, warn_percent(%)=80, limit_size(MB)=972, limit_percent(%)=95, total_unrecyclable_size_byte(MB)=908, maximum_used_size(MB)=972, maximum_log_stream=1, oldest_log_stream=1, oldest_scn={val:1723646492879591001, v:0}, in_shrinking=false)
[2024-08-21 11:49:36.867754] ERROR period_calc_disk_usage (palf_env_impl.cpp:1353) [400779][T1003_LogLoop][T1003][Y0-0000000000000000-0-0] [lt=17][errcode=-4264] Log out of disk space(msg=“log disk space is almost full”, ret=-4264, total_size(MB)=1024, used_size(MB)=972, used_percent(%)=95, warn_size(MB)=819, warn_percent(%)=80, limit_size(MB)=972, limit_percent(%)=95)
obd cluster stop/start obcluster
磁盘空间满了吗
还有10%,20G
Log out of disk space(msg=“log disk space is almost full”, ret=-4264, total_size(MB)=1024, used_size(MB)=972, used_percent(%)=95, warn_size(MB)=819, warn_percent(%)=80, limit_size(MB)=972, limit_percent(%)=95, total_unrecyclable_size_byte(MB)=908, maximum_used_size(MB)=972, maximum_log_stream=1, oldest_log_stream=1, oldest_scn={val:1723775960462531001, v:0}, in_shrinking=false)
clog空间满了,首先 df -h 检查下clog盘是否满了,如果满了扩容下,
其次可以尝试带参启动
/bin/observer -o “log_disk_size=50G,log_disk_utilization_threshold=95,log_disk_utilization_limit_threshold=98”
total_size(MB)=1024 你这里1001租户的clog 之分配了1G,可以先带参启动,然后再调整
按你建议重启后 ,还是报磁盘资源不足。租户只分配了10g,所以meta租户系统只给了1g,1g占满了,磁盘空间还有,但是现在无法调整租户资源,手动启动参数中是否可以指定租户规格。
手动启动参数中无法指定某个租户的参数,只可以通过扩大整体 log_disk_size log_disk_utilization_threshold log_disk_utilization_limit_threshold 解决,你可以先放大log_disk_size 起来后再调整回去
log_disk_utilization_threshold=95,log_disk_utilization_limit_threshold=98
这两个参数在加大点,看能不能起来
总的log_disk_size 大小是够的,是分配给租户的不够
都加到99了,还是起不来
log_disk_utilization_threshold=96,log_disk_utilization_limit_threshold=99这样试下
你将这3个参数一起放大还起不来吗? log_disk_size log_disk_utilization_threshold log_disk_utilization_limit_threshold
admin 379063 573 9.8 13023284 11323744 ? Ssl 16:33 236:36 /data/oceanbase/oceanbase-ce/bin/observer -r 192.168.:22882:22881 -p 22881 -P 22882 -z zone1 -n obcluster -c 17 -d /data/oceanbase/data -i em1 -l INFO -o __min_full_resource_pool_memory=2147483648,memory_limit=50G,system_memory=15G,datafile_size=30G,log_disk_size=70G,enable_syslog_wf=False,enable_syslog_recycle=True,max_syslog_file_count=4,skip_proxy_sys_private_check=True,enable_strict_kernel_release=False,cpu_count=32,log_disk_utilization_threshold=99,log_disk_utilization_limit_threshold=99
目前参数是这样的,但是日志里没有变
[2024-08-21 17:13:52.009307] INFO [PALF] runTimerTask (block_gc_timer_task.cpp:101) [379668][T1001_PalfGC][T1001][Y0-0000000000000000-0-0] [lt=1] BlockGCTimerTask success(ret=0, cost_time_us=44, palf_env_impl_={IPalfEnvImpl:{IPalfEnvImpl:“Dummy”}, self:“192.168***:22882”, log_dir:"/data/oceanbase/data/clog/tenant_1001", disk_options_wrapper:{disk_opts_for_stopping_writing:{log_disk_size(MB):1024, log_disk_utilization_threshold(%):80, log_disk_utilization_limit_threshold(%):95, log_disk_throttling_percentage(%):100, log_disk_throttling_maximum_duration(s):7200, log_writer_parallelism:3}, disk_opts_for_recycling_blocks:{log_disk_size(MB):1024, log_disk_utilization_threshold(%):80, log_disk_utilization_limit_threshold(%):95, log_disk_throttling_percentage(%):100, log_disk_throttling_maximum_duration(s):7200, log_writer_parallelism:3}, status:1, cur_unrecyclable_log_disk_size(MB):908, sequence:0}, log_alloc_mgr_:{flying_log_task:0, flying_meta_task:0}})
strings /home/admin/oceanabse/etc/observer.config.bin
看下
这个结果看到log_disk_utilization_threshold, log_disk_utilization_limit_threshold,log_disk_size是生效的 ,看你上面的observer.log的数据 又是没生效的,obsever.log是对应的报错日志吗?
另外截图看下clog物理盘空间使用情况
df -h