
【 使用环境 】生产环境
【 OB or 其他组件 】
OCP 社区版版本号: 4.3.3-20241219140415
OCP OB集群版本:
【 使用版本 】



告警概述:alarm_template_id=0:ob_cluster=myocp-1735117217:host= OBServer 非预期的内部错误

告警详情:[OBServer 非预期的内部错误] 集群:myocp,主机:,日志类型:observer,日志文件:/home/admin/oceanbase/log/observer.log,日志级别:ERROR,关键字=Unexpected internal error happen,错误码=4388,日志详情=[2025-03-13 23:06:35.232744] ERROR issue_dba_error (ob_log.cpp:1875) [4080407][T1002_L0_G0][T1002][YB420A00CA5D-00062B7F1A71A492-0-0] [lt=22][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4002, file=“ob_req_time_service.h”, line_no=60, info=“invalid start and end time”)。


【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)


根因分析场景: 事务回滚报错
根因分析命令: obdiag rca run --scene=transaction_rollback

日志采集场景: 未知问题场景
日志采集命令: obdiag gather scene run --scene=observer.unknown

为了更好地帮助您定位和解决问题,建议您使用上述 obdiag 命令进行根因分析和日志采集,并将采集到的日志提供给我们以便进一步确认问题所在。


  1. 是否可以提供完整的错误日志内容或上下文信息?特别是包含 invalid start and end time 的完整堆栈信息。
  2. 错误发生时,是否有执行特定的 SQL 或者操作?如果有,请提供相关的 SQL 语句。
  3. 当前 OceanBase 集群的版本号是多少?
  4. 是否有最近的配置变更或者数据量的变化?
  5. 错误发生的频率如何?是偶现还是每次操作都会触发?

附上敏捷诊断工具 obdiag 使用帮助链接


  1. 用敏捷诊断工具obdiag 拿一下巡检报告obdiag check run ,文档 https://www.oceanbase.com/docs/common-obdiag-cn-1000000002488230
  2. 用obdiag 分析下报错发生前15分钟左右的日志看看还有没有别的问题:https://www.oceanbase.com/docs/common-obdiag-cn-1000000002488226
1 个赞

| critical-tasks-report |
| task | task_report |
| cluster.data_path_settings | [critical] [remote_10.0.202.93] data_dir_path is null . Please check your nodes.data_dir need absolute Path |
| disk.sstable_abnormal_file | [critical] [remote_10.0.202.93] sstable_dir_path is null . Please check your nodes.data_dir need absolute Path |
| network.TCP-retransmission | [critical] [remote_10.0.202.93] tsar is not installed. we can not check tcp retransmission. |
| warning-tasks-report |
| task | task_report |
| bugs.bug_385 | [warning] Unadapted by version. SKIP |
| cluster.ob_enable_plan_cache_bad_version | [warning] Unadapted by version. SKIP |
| cluster.optimizer_better_inlist_costing_parmmeter | [warning] Unadapted by version. SKIP |
| cluster.part_trans_action_max | [warning] Unadapted by version. SKIP |
| cluster.table_history_too_many | [warning] Unadapted by version. SKIP |
| system.instruction_set_avx2 | [warning] Unadapted by version. SKIP |
| all-tasks-report |
| task | task_report |
| bugs.bug_182 | all pass |
| bugs.bug_385 | [warning] Unadapted by version. SKIP |
| bugs.bug_469 | all pass |
| clog.clog_disk_full | all pass |
| cluster.core_file_find | all pass |
| cluster.data_path_settings | [critical] [remote_10.0.202.93] data_dir_path is null . Please check your nodes.data_dir need absolute Path |
| cluster.deadlocks | all pass |
| cluster.global_indexes_too_much | all pass |
| cluster.major | all pass |
| cluster.mod_too_large | all pass |
| cluster.ob_enable_plan_cache_bad_version | [warning] Unadapted by version. SKIP |
| cluster.observer_not_active | all pass |
| cluster.optimizer_better_inlist_costing_parmmeter | [warning] Unadapted by version. SKIP |
| cluster.part_trans_action_max | [warning] Unadapted by version. SKIP |
| cluster.resource_limit_max_session_num | all pass |
| cluster.sys_log_level | all pass |
| cluster.table_history_too_many | [warning] Unadapted by version. SKIP |
| cluster.task_opt_stat | all pass |
| cluster.task_opt_stat_gather_fail | all pass |
| cluster.tenant_number | all pass |
| cpu.oversold | all pass |
| disk.clog_abnormal_file | all pass |
| disk.disk_full | all pass |
| disk.disk_hole | all pass |
| disk.disk_iops | all pass |
| disk.sstable_abnormal_file | [critical] [remote_10.0.202.93] sstable_dir_path is null . Please check your nodes.data_dir need absolute Path |
| disk.xfs_repair | all pass |
| err_code.find_err_4000 | all pass |
| err_code.find_err_4001 | all pass |
| err_code.find_err_4012 | all pass |
| err_code.find_err_4013 | all pass |
| err_code.find_err_4015 | all pass |
| err_code.find_err_4016 | all pass |
| err_code.find_err_4103 | all pass |
| err_code.find_err_4105 | all pass |
| err_code.find_err_4377 | all pass |
| network.TCP-retransmission | [critical] [remote_10.0.202.93] tsar is not installed. we can not check tcp retransmission. |
| system.aio | all pass |
| system.clock_source | all pass |
| system.core_pattern | all pass |
| system.dependent_software | all pass |
| system.dependent_software_swapon | all pass |
| system.getenforce | all pass |
| system.instruction_set_avx2 | [warning] Unadapted by version. SKIP |
| system.parameter | all pass |
| system.parameter_ip_local_port_range | all pass |
| system.parameter_tcp_rmem | all pass |
| system.parameter_tcp_wmem | all pass |
| system.ulimit_parameter | all pass |
| table.information_schema_tables_two_data | all pass |
| version.bad_version | all pass |
| version.old_version | all pass |

上面几个critical不知道是不是有问题,因为这个ocp集群就是个单机 也没有obproxy, 安装目录都是默认的,我巡检其他集群 都不会有这个critical。。。。。。

看告警时间点在这里2025-03-13 23:06:35.232744,这个时间的observer.log还在吗?