【 使用环境 】生产环境
【 OB or 其他组件 】
OCP 社区版版本号: 4.3.3-20241219140415
OCP OB集群版本:4.2.1.8
【 使用版本 】
【问题描述】
告警事件详情
告警规则
ob_common_unexpected_internal_error
来源
ob_common_unexpected_internal_error
告警概述:alarm_template_id=0:ob_cluster=myocp-1735117217:host=10.0.202.93 OBServer 非预期的内部错误
告警详情:[OBServer 非预期的内部错误] 集群:myocp,主机:10.0.202.93,日志类型:observer,日志文件:/home/admin/oceanbase/log/observer.log,日志级别:ERROR,关键字=Unexpected internal error happen,错误码=4388,日志详情=[2025-03-13 23:06:35.232744] ERROR issue_dba_error (ob_log.cpp:1875) [4080407][T1002_L0_G0][T1002][YB420A00CA5D-00062B7F1A71A492-0-0] [lt=22][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4002, file=“ob_req_time_service.h”, line_no=60, info=“invalid start and end time”)。
【复现路径】问题出现前后相关操作
没有复现路径,是在告警通知中看到的
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):
【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)
@论坛小助手
论坛小助手
#3
根因分析场景: 事务回滚报错
根因分析命令: obdiag rca run --scene=transaction_rollback
日志采集场景: 未知问题场景
日志采集命令: obdiag gather scene run --scene=observer.unknown
为了更好地帮助您定位和解决问题,建议您使用上述 obdiag
命令进行根因分析和日志采集,并将采集到的日志提供给我们以便进一步确认问题所在。
另外,为了确定具体的问题,在此需要您提供一些信息:
- 是否可以提供完整的错误日志内容或上下文信息?特别是包含
invalid start and end time
的完整堆栈信息。
- 错误发生时,是否有执行特定的 SQL 或者操作?如果有,请提供相关的 SQL 语句。
- 当前 OceanBase 集群的版本号是多少?
- 是否有最近的配置变更或者数据量的变化?
- 错误发生的频率如何?是偶现还是每次操作都会触发?
附上敏捷诊断工具 obdiag 使用帮助链接
我这个是all-in-one包装的ocp集群,这个集群是个单机,用来管理其他ob集群的,然后这个错误就是在ocp上报的,已经很久了,现在日志找不到这个错误了,然后一键巡检结果是:
±---------------------------------------------------------------------------------------------------------------------------------------------+
| critical-tasks-report |
±---------------------------±----------------------------------------------------------------------------------------------------------------+
| task | task_report |
±---------------------------±----------------------------------------------------------------------------------------------------------------+
| cluster.data_path_settings | [critical] [remote_10.0.202.93] data_dir_path is null . Please check your nodes.data_dir need absolute Path |
| disk.sstable_abnormal_file | [critical] [remote_10.0.202.93] sstable_dir_path is null . Please check your nodes.data_dir need absolute Path |
| network.TCP-retransmission | [critical] [remote_10.0.202.93] tsar is not installed. we can not check tcp retransmission. |
±---------------------------±----------------------------------------------------------------------------------------------------------------+
±-----------------------------------------------------------------------------------------+
| warning-tasks-report |
±--------------------------------------------------±-------------------------------------+
| task | task_report |
±--------------------------------------------------±-------------------------------------+
| bugs.bug_385 | [warning] Unadapted by version. SKIP |
| cluster.ob_enable_plan_cache_bad_version | [warning] Unadapted by version. SKIP |
| cluster.optimizer_better_inlist_costing_parmmeter | [warning] Unadapted by version. SKIP |
| cluster.part_trans_action_max | [warning] Unadapted by version. SKIP |
| cluster.table_history_too_many | [warning] Unadapted by version. SKIP |
| system.instruction_set_avx2 | [warning] Unadapted by version. SKIP |
±--------------------------------------------------±-------------------------------------+
±--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| all-tasks-report |
±--------------------------------------------------±----------------------------------------------------------------------------------------------------------------+
| task | task_report |
±--------------------------------------------------±----------------------------------------------------------------------------------------------------------------+
| bugs.bug_182 | all pass |
| bugs.bug_385 | [warning] Unadapted by version. SKIP |
| bugs.bug_469 | all pass |
| clog.clog_disk_full | all pass |
| cluster.core_file_find | all pass |
| cluster.data_path_settings | [critical] [remote_10.0.202.93] data_dir_path is null . Please check your nodes.data_dir need absolute Path |
| cluster.deadlocks | all pass |
| cluster.global_indexes_too_much | all pass |
| cluster.major | all pass |
| cluster.mod_too_large | all pass |
| cluster.ob_enable_plan_cache_bad_version | [warning] Unadapted by version. SKIP |
| cluster.observer_not_active | all pass |
| cluster.optimizer_better_inlist_costing_parmmeter | [warning] Unadapted by version. SKIP |
| cluster.part_trans_action_max | [warning] Unadapted by version. SKIP |
| cluster.resource_limit_max_session_num | all pass |
| cluster.sys_log_level | all pass |
| cluster.table_history_too_many | [warning] Unadapted by version. SKIP |
| cluster.task_opt_stat | all pass |
| cluster.task_opt_stat_gather_fail | all pass |
| cluster.tenant_number | all pass |
| cpu.oversold | all pass |
| disk.clog_abnormal_file | all pass |
| disk.disk_full | all pass |
| disk.disk_hole | all pass |
| disk.disk_iops | all pass |
| disk.sstable_abnormal_file | [critical] [remote_10.0.202.93] sstable_dir_path is null . Please check your nodes.data_dir need absolute Path |
| disk.xfs_repair | all pass |
| err_code.find_err_4000 | all pass |
| err_code.find_err_4001 | all pass |
| err_code.find_err_4012 | all pass |
| err_code.find_err_4013 | all pass |
| err_code.find_err_4015 | all pass |
| err_code.find_err_4016 | all pass |
| err_code.find_err_4103 | all pass |
| err_code.find_err_4105 | all pass |
| err_code.find_err_4377 | all pass |
| network.TCP-retransmission | [critical] [remote_10.0.202.93] tsar is not installed. we can not check tcp retransmission. |
| system.aio | all pass |
| system.clock_source | all pass |
| system.core_pattern | all pass |
| system.dependent_software | all pass |
| system.dependent_software_swapon | all pass |
| system.getenforce | all pass |
| system.instruction_set_avx2 | [warning] Unadapted by version. SKIP |
| system.parameter | all pass |
| system.parameter_ip_local_port_range | all pass |
| system.parameter_tcp_rmem | all pass |
| system.parameter_tcp_wmem | all pass |
| system.ulimit_parameter | all pass |
| table.information_schema_tables_two_data | all pass |
| version.bad_version | all pass |
| version.old_version | all pass |
±--------------------------------------------------±----------------------------------------------------------------------------------------------------------------+
上面几个critical不知道是不是有问题,因为这个ocp集群就是个单机 也没有obproxy, 安装目录都是默认的,我巡检其他集群 都不会有这个critical。。。。。。
旭辉
#6
看告警时间点在这里2025-03-13 23:06:35.232744,这个时间的observer.log还在吗?