OB集群的节点总是自动停止

【 使用环境 】测试环境
【 OB or 其他组件 】OB
【 使用版本 】
【问题描述】清晰明确描述问题
【复现路径】问题出现前后相关操作
observer.log (5.5 MB)

是cpu不支持avx指令集导致的,可以执行 lscpu 看下,

解决方案是更换为支持avx指令集的cpu

raise_exception:                                  ; preds = %normal_raise_block, %ob_fail, %ob_fail, %ob_fail
  %raise_exception91 = call i32 @_Unwind_RaiseException(%unwind_exception* %create_exception)
  unreachable

normal_raise_block:                               ; preds = %ob_fail
  %get_exception_class = call i64 @eh_classify_exception(i8* %load_sql_state)
  %get_exception_class.off = add i64 %get_exception_class, -3
  %switch = icmp ult i64 %get_exception_class.off, 2
  br i1 %switch, label %ob_success, label %raise_exception
}
")
[2025-03-06 20:50:35.307369] INFO  [SHARE] run_loop_ (ob_bg_thread_monitor.cpp:331) [695341][BGThreadMonitor][T0][Y0-0000000000000000-0-0] [lt=35] current monitor number(seq_=-1)
[2025-03-06 20:50:35.308292] INFO  [STORAGE] scheduler_ls_ha_handler_ (ob_storage_ha_service.cpp:195) [694118][T1_HAService][T1][Y0-0000000000000000-0-0] [lt=27] start do ls ha handler(ls_id_array_=[{id:1}])
[2025-03-06 20:50:35.313497] INFO  [STORAGE] print_statistics (ob_tmp_file_thread_job.cpp:179) [694080][T1_TFSwap][T1][Y0-0000000000000000-0-0] [lt=19] tmp file swap statistics(swap_task_cnt=0, avg_swap_response_time=0, max_swap_response_time=-1, min_swap_response_time=-1)
[2025-03-06 20:50:35.313532] INFO  [STORAGE] print_block_usage (ob_tmp_file_block_manager.cpp:822) [694080][T1_TFSwap][T1][Y0-0000000000000000-0-0] [lt=28] temporary file module use no blocks
[2025-03-06 20:50:35.313540] INFO  [STORAGE] print_statistics (ob_tmp_file_thread_job.cpp:125) [694080][T1_TFSwap][T1][Y0-0000000000000000-0-0] [lt=7] tmp file flush statistics(flush_task_cnt=0, avg_flush_data_len=0, max_flush_data_len=-1, min_flush_data_len=-1, f1_cnt=0, f2_cnt=0, f3_cnt=0, f4_cnt=0, f5_cnt=0)
[2025-03-06 20:50:35.313550] INFO  [STORAGE] print_statistics (ob_tmp_file_write_buffer_pool.cpp:1082) [694080][T1_TFSwap][T1][Y0-0000000000000000-0-0] [lt=9] tmp file write buffer pool statistics(dirty_page_percentage=0, max_page_num=6578, dirty_page_num=0, total_write_back_num=0, meta_page_num=0, dirty_meta_page_num=0, write_back_meta_num=0, data_page_num=0, dirty_data_page_num=0, write_back_data_num=0, data_page_watermark=0, meta_page_watermark=0)
[2025-03-06 20:50:35.313559] INFO  [STORAGE] do_work_ (ob_tmp_file_thread_wrapper.cpp:308) [694080][T1_TFSwap][T1][Y0-0000000000000000-0-0] [lt=8] ObTmpFileFlushTG information(this={is_inited_:true, mode_:1, last_flush_timestamp_:16960630603, flush_io_finished_ret_:0, flush_io_finished_round_:973, flushing_block_num_:0, is_fast_flush_meta_:false, fast_flush_meta_task_cnt_:0, wait_list_size_:0, retry_list_size_:0, finished_list_size_:0, normal_loop_cnt_:2, normal_idle_loop_cnt_:2, fast_loop_cnt_:0, fast_idle_loop_cnt_:0, flush_mgr_:{is_inited_:true, flush_ctx_:{is_inited_:true, fail_too_many_:false, expect_flush_size_:0, actual_flush_size_:0, flush_seq_ctx_:{create_flush_task_cnt_:0, prepare_finished_cnt_:0, flush_sequence_:0}, state_:5, iter_:{is_inited_:false, cur_caching_list_idx_:0, cur_caching_list_is_meta_:false, cur_iter_dir_idx_:-1, cur_iter_file_idx_:-1, cached_file_num_:0, cached_dir_num_:0}}}})

2 个赞

龙蜥8.9操作系统

1 个赞

cpu指令集不包含avx,是这个原因

1 个赞


更新主板有那个指令集了

1 个赞

现在有个新问题,重启服务器之后 OCP服务就起不来了

1 个赞

image
卡这里有十多分钟了

1 个赞

麻烦发下ocp-server.log及obd.log

插纸一算,资源不足了 :joy:

大佬666