【 使用环境 】生产环境
【 OB 】
【 使用版本 】4.2.1
【问题描述】oceanbase 4.2.1 合并卡住 超过9天未完成 报错4019
【复现路径】无
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):
【备注】基于 LLM 和开源文档 RAG 的论坛小助手已开放测试,在发帖时输入 [@论坛小助手] 即可召唤小助手,欢迎试用!
【 使用环境 】生产环境
【 OB 】
【 使用版本 】4.2.1
【问题描述】oceanbase 4.2.1 合并卡住 超过9天未完成 报错4019
【复现路径】无
【附件及日志】推荐使用OceanBase敏捷诊断工具obdiag收集诊断信息,详情参见链接(右键跳转查看):
【备注】基于 LLM 和开源文档 RAG 的论坛小助手已开放测试,在发帖时输入 [@论坛小助手] 即可召唤小助手,欢迎试用!
根据您提供的信息,OceanBase 4.2.1 版本在合并过程中卡住超过9天,并且报错4019。然而,根据提供的文档片段,没有直接提到错误码4019的具体含义或解决方案。因此,我将基于我的内在知识为您提供可能的解答。
合并过程中的资源瓶颈
SHOW PROCESSLIST
查看当前正在运行的会话,找出可能占用大量资源的会话并优化或终止它们。obdiag
工具收集诊断信息,分析合并过程中的资源使用情况。数据量过大
网络问题
ping
或 traceroute
工具检查网络延迟。配置参数不当
major_freeze_duty_time
参数,确保合并时间窗口设置合理。minor_freeze_times
和 major_freeze_times
参数,优化合并频率。日志文件分析
obdiag
工具收集日志文件,分析日志中的错误信息和警告信息。observer.log
文件,查找与合并相关的日志记录。收集诊断信息:
obdiag
工具收集诊断信息,具体命令如下:obdiag collect log --tenant=<your_tenant_name> --start-time="2023-01-01 00:00:00" --end-time="2023-01-02 00:00:00"
分析资源使用情况:
SHOW PROCESSLIST
查看当前会话:SHOW PROCESSLIST;
检查配置参数:
SHOW PARAMETERS LIKE 'major_freeze_duty_time';
SHOW PARAMETERS LIKE 'minor_freeze_times';
SHOW PARAMETERS LIKE 'major_freeze_times';
联系技术支持:
希望以上信息对您有所帮助。如果您有更多详细信息或日志文件,欢迎提供,以便我们更好地为您解决问题。
具体信息可参考以下文档:
obdiag rca run --scene=major_hold抓取
obclient [oceanbase]> SELECT * FROM oceanbase.CDB_OB_MAJOR_COMPACTION\G
*************************** 1. row ***************************
TENANT_ID: 1
FROZEN_SCN: 1733662804789187373
FROZEN_TIME: 2024-12-08 21:00:04.789187
GLOBAL_BROADCAST_SCN: 1733662804789187373
LAST_SCN: 1733662804789187373
LAST_FINISH_TIME: 2024-12-08 21:04:00.952471
START_TIME: 2024-12-08 21:00:04.927890
STATUS: IDLE
IS_ERROR: NO
IS_SUSPENDED: NO
INFO:
*************************** 2. row ***************************
TENANT_ID: 1001
FROZEN_SCN: 1733680804861169439
FROZEN_TIME: 2024-12-09 02:00:04.861169
GLOBAL_BROADCAST_SCN: 1733680804861169439
LAST_SCN: 1733680804861169439
LAST_FINISH_TIME: 2024-12-09 02:02:32.539844
START_TIME: 2024-12-09 02:00:05.001461
STATUS: IDLE
IS_ERROR: NO
IS_SUSPENDED: NO
INFO:
*************************** 3. row ***************************
TENANT_ID: 1002
FROZEN_SCN: 1732971602166717024
FROZEN_TIME: 2024-11-30 21:00:02.166717
GLOBAL_BROADCAST_SCN: 1732971602166717024
LAST_SCN: 1732885201440173241
LAST_FINISH_TIME: 2024-11-30 12:02:32.934553
START_TIME: 2024-11-30 21:00:04.192632
STATUS: COMPACTING
IS_ERROR: NO
IS_SUSPENDED: NO
INFO:
3 rows in set (0.014 sec)
obclient [oceanbase]> select * from __all_virtual_compaction_diagnose_info;
±------------±---------±----------±-------------±------±----------±-------------±---------------------------±--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| svr_ip | svr_port | tenant_id | type | ls_id | tablet_id | status | create_time | diagnose_info |
±------------±---------±----------±-------------±------±----------±-------------±---------------------------±--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 81.16.49.44 | 2882 | 1002 | MINI_MERGE | 1 | 330 | NOT_SCHEDULE | 2024-12-09 09:25:11.027250 | schedule_suspect_info=“info=“memtable can not create dag successfully”;error_code=-4023;has been ready for flush time=8923518467;ready for flush time=1733698587508771;” |
| 81.16.49.44 | 2882 | 1002 | MEDIUM_MERGE | 1 | 329 | FAILED | 2024-11-30 21:17:36.633805 | error_no=-4019,last_error_time=1733706355752898,error_trace=YB425110312C-000615145E7A25E9-0-0,warning=“info=“DAG_MAJOR_MERGE”;ls_id=1;tablet_id=329;compaction_scn=1732971602166717024;extra_info=“merge_type=“MEDIUM_MERGE””;” |
| 81.16.49.44 | 2882 | 1002 | MINI_MERGE | 1 | 329 | NOT_SCHEDULE | 2024-12-09 09:25:16.029205 | schedule_suspect_info=“info=“memtable can not create dag successfully”;error_code=-4023;has been ready for flush time=8928520426;ready for flush time=1733698587508766;” |
±------------±---------±----------±-------------±------±----------±-------------±---------------------------±--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (4.961 sec)
根据您提供的信息,OceanBase 4.2.1 版本在合并过程中卡住超过9天,并且报错4019。您已经提供了更多的诊断信息,包括 CDB_OB_MAJOR_COMPACTION
和 __all_virtual_compaction_diagnose_info
表的数据,以及 obdiag
工具生成的诊断报告。以下是对这些信息的分析和可能的解决方案:
合并状态:
CDB_OB_MAJOR_COMPACTION
表中可以看到,租户 1002
的合并状态为 COMPACTING
,而其他租户的状态为 IDLE
。1002
的合并操作正在进行中,但已经持续了很长时间。诊断信息:
__all_virtual_compaction_diagnose_info
表中可以看到,租户 1002
的 MEDIUM_MERGE
操作失败,错误码为 -4019
。DAG_MAJOR_MERGE
失败,这可能是导致合并卡住的原因之一。资源使用情况:
obdiag
工具生成了诊断报告,这有助于进一步分析资源使用情况和潜在问题。资源瓶颈:
SHOW PROCESSLIST
查看当前正在运行的会话,找出可能占用大量资源的会话并优化或终止它们。obdiag
生成的诊断报告,特别是资源使用情况部分。数据量过大:
网络问题:
ping
或 traceroute
工具检查网络延迟。配置参数不当:
major_freeze_duty_time
参数,确保合并时间窗口设置合理。minor_freeze_times
和 major_freeze_times
参数,优化合并频率。错误码 -4019
:
-4019
通常表示合并过程中遇到了某种错误,可能是由于资源不足、数据损坏或其他内部问题。observer.log
文件,查找与合并相关的日志记录,特别是错误码 -4019
的详细信息。分析 obdiag
诊断报告:
obdiag
生成的诊断报告,特别是资源使用情况和合并过程的部分。检查日志文件:
obdiag
工具收集日志文件,分析日志中的错误信息和警告信息。observer.log
文件,查找与合并相关的日志记录。调整配置参数:
SHOW PARAMETERS LIKE 'major_freeze_duty_time';
SHOW PARAMETERS LIKE 'minor_freeze_times';
SHOW PARAMETERS LIKE 'major_freeze_times';
联系技术支持:
希望以上信息对您有所帮助。如果您有更多详细信息或日志文件,欢迎提供,以便我们更好地为您解决问题。
详情请参考:
(小助手的答复已结束,如未能解决您的问题,请继续提问并等待其他同学的回复,谢谢!)
你根据这个文档 排查那些表的合并有问题
https://www.oceanbase.com/knowledge-base/oceanbase-database-1000000000685150?back=kb
麻烦查下81.16.49.44这个节点里的 日志里是否有
tablet multi version start not advance for a long time
怀疑是因为备份卡住导致合并卡住报错
如果是 三副本架构的话,可以挨个重启一下 OB 节点。大概率能恢复。
卡主的表是系统表?难道是触发BUG了?
obclient [oceanbase]> select table_name from __all_virtual_table where tablet_id=329;
±------------------+
| table_name |
±------------------+
| __all_column_stat |
±------------------+
1 row in set (0.114 sec)
你的ob版本 是ob4.2.1bp3hf1么?
最初始的4.2.1,看在bp3才修复这个问题?
你按照那个文档 再往下排查一下 看看具体什么原因导致的转储失败了 排查的信息 尽量贴出来 目前还确定不了什么原因导致的
obclient [oceanbase]> select *
→ from __all_virtual_server_compaction_event_history
→ where tenant_id = 1002
→ and compaction_scn = 1732971602166717024
→ and event like ‘%FINISHED%’;
±------------±---------±------±----------±------------±--------------------±---------------------------±-----------------------------------------------------------------------+
| svr_ip | svr_port | zone | tenant_id | type | compaction_scn | event_timestamp | event |
±------------±---------±------±----------±------------±--------------------±---------------------------±-----------------------------------------------------------------------+
| 81.16.49.46 | 2882 | zone3 | 1002 | MAJOR_MERGE | 1732971602166717024 | 2024-12-01 12:13:18.687964 | cost_time:3405.72s | TABLET_COMPACTION_FINISHED:cost_time=54786295940, |
| 81.16.49.45 | 2882 | zone2 | 1002 | MAJOR_MERGE | 1732971602166717024 | 2024-12-01 14:15:00.267005 | cost_time:2611.88s | TABLET_COMPACTION_FINISHED:cost_time=62090443016, |
±------------±---------±------±----------±------------±--------------------±---------------------------±-----------------------------------------------------------------------+
select * from __all_virtual_dag_warning_history;
±------------±---------±----------±----------------------------------±-----------±---------------------±-------------------------------------±--------±---------------------------±---------------------------±----------±-------------------------------------------------------------------------------------------------------------------------------------------+
| svr_ip | svr_port | tenant_id | task_id | module | type | ret | status | gmt_create | gmt_modified | retry_cnt | warning_info |
±------------±---------±----------±----------------------------------±-----------±---------------------±-------------------------------------±--------±---------------------------±---------------------------±----------±-------------------------------------------------------------------------------------------------------------------------------------------+
| 81.16.49.46 | 2882 | 1002 | YB425110312E-000609789B86EBA6-0-0 | COMPACTION | MAJOR_MERGE | OB_TIMEOUT | WARNING | 2023-12-07 12:41:16.994083 | 2023-12-07 12:41:16.994083 | 0 | info=“DAG_MAJOR_MERGE”;ls_id=1002;tablet_id=1152921504609076455;compaction_scn=1701885602321989480;extra_info=“merge_type=“MEDIUM_MERGE””; |
| 81.16.49.46 | 2882 | 1002 | YB425110312E-000609789B86EBAF-0-0 | COMPACTION | MAJOR_MERGE | OB_TIMEOUT | WARNING | 2023-12-07 12:41:29.292828 | 2023-12-07 12:41:29.292828 | 0 | info=“DAG_MAJOR_MERGE”;ls_id=1002;tablet_id=1152921504607078196;compaction_scn=1701885602321989480;extra_info=“merge_type=“MEDIUM_MERGE””; |
| 81.16.49.46 | 2882 | 1002 | YB425110312E-000609789B86EBBD-0-0 | COMPACTION | MAJOR_MERGE | OB_TIMEOUT | WARNING | 2023-12-07 12:41:49.295295 | 2023-12-07 12:41:49.295295 | 0 | info=“DAG_MAJOR_MERGE”;ls_id=1002;tablet_id=1152921504609085858;compaction_scn=1701885602321989480;extra_info=“merge_type=“MEDIUM_MERGE””; |
| 81.16.49.46 | 2882 | 1002 | YB425110312E-000609789B86EBC6-0-0 | COMPACTION | MAJOR_MERGE | OB_TIMEOUT | WARNING | 2023-12-07 12:42:08.719468 | 2023-12-07 12:42:08.719468 | 0 | info=“DAG_MAJOR_MERGE”;ls_id=1002;tablet_id=1152921504607073689;compaction_scn=1701885602321989480;extra_info=“merge_type=“MEDIUM_MERGE””; |
| 81.16.49.46 | 2882 | 1002 | YB425110312E-0006097AF5B3767D-0-0 | BACKUP | BACKUP_PREPARE | OB_TIMEOUT | WARNING | 2024-04-02 16:15:51.898716 | 2024-04-02 16:15:51.898716 | 0 | info=“DAG_BACKUP_PREPAER”;tenant_id=1002;backup_set_id=8;ls_id=1001;turn_id=1;retry_id=0; |
| 81.16.49.46 | 2882 | 1002 | YB425110312E-0006097B44E7BD76-0-0 | BACKUP | BACKUP_DATA | OB_BACKUP_DEVICE_OUT_OF_SPACE | WARNING | 2024-04-20 17:00:20.437523 | 2024-04-20 17:00:20.437523 | 0 | info=“DAG_BACKUP_DATA”;tenant_id=1002;backup_set_id=16;backup_data_type=2;ls_id=1001;turn_id=1;retry_id=0;task_id=36; |
| 81.16.49.46 | 2882 | 1002 | YB425110312E-0006097B44E7BD79-0-0 | BACKUP | BACKUP_DATA | OB_BACKUP_DEVICE_OUT_OF_SPACE | WARNING | 2024-04-20 17:00:20.475745 | 2024-04-20 17:00:20.475745 | 0 | info=“DAG_BACKUP_DATA”;tenant_id=1002;backup_set_id=16;backup_data_type=2;ls_id=1001;turn_id=1;retry_id=0;task_id=37; |
| 81.16.49.46 | 2882 | 1002 | YB425110312E-0006097C514B292D-0-0 | BACKUP | BACKUP_PREPARE | OB_BACKUP_ADVANCE_CHECKPOINT_TIMEOUT | WARNING | 2024-07-10 10:40:52.496160 | 2024-07-10 10:40:52.496160 | 0 | info=“DAG_BACKUP_PREPAER”;tenant_id=1002;backup_set_id=44;ls_id=1001;turn_id=1;retry_id=0; |
| 81.16.49.46 | 2882 | 1002 | YB425110312E-0006097EB042FC07-0-0 | BACKUP | BACKUP_PREPARE | OB_TIMEOUT | WARNING | 2024-10-29 16:02:30.689259 | 2024-10-29 16:02:30.689259 | 0 | info=“DAG_BACKUP_PREPAER”;tenant_id=1002;backup_set_id=92;ls_id=1001;turn_id=1;retry_id=0; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-00060B91007EE397-0-0 | BACKUP | BACKUP_META | OB_TIMEOUT | WARNING | 2024-04-02 16:15:13.897556 | 2024-04-02 16:15:13.897556 | 0 | info=“DAG_BACKUP_META”;ls_id=1002; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-00060B91007EE47C-0-0 | BACKUP | BACKUP_PREPARE | OB_TIMEOUT | WARNING | 2024-04-02 16:15:13.899922 | 2024-04-02 16:15:13.899922 | 0 | info=“DAG_BACKUP_PREPAER”;tenant_id=1002;backup_set_id=8;ls_id=1002;turn_id=1;retry_id=0; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-00060BB856599BBA-0-0 | BACKUP | PREFETCH_BACKUP_INFO | OB_REPLICA_CANNOT_BACKUP | WARNING | 2024-04-06 16:45:00.088232 | 2024-04-06 16:45:00.088232 | 0 | info=“DAG_PREFETCH_BACKUP_INFO”;tenant_id=1002;backup_set_id=10;backup_data_type=1;ls_id=1002;turn_id=1;retry_id=0;task_id=58; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-00060C28C3FCE1C2-0-0 | BACKUP | PREFETCH_BACKUP_INFO | OB_REPLICA_CANNOT_BACKUP | WARNING | 2024-04-18 16:05:06.397421 | 2024-04-18 16:05:06.397421 | 0 | info=“DAG_PREFETCH_BACKUP_INFO”;tenant_id=1002;backup_set_id=15;backup_data_type=1;ls_id=1002;turn_id=1;retry_id=0;task_id=0; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-00060C3FF6AAB750-0-0 | BACKUP | BACKUP_DATA | OB_BACKUP_DEVICE_OUT_OF_SPACE | WARNING | 2024-04-20 17:00:20.468430 | 2024-04-20 17:00:20.468430 | 0 | info=“DAG_BACKUP_DATA”;tenant_id=1002;backup_set_id=16;backup_data_type=2;ls_id=1002;turn_id=1;retry_id=0;task_id=51; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-00060C3FF6AAB74E-0-0 | BACKUP | BACKUP_DATA | OB_BACKUP_DEVICE_OUT_OF_SPACE | WARNING | 2024-04-20 17:00:20.630305 | 2024-04-20 17:00:20.630305 | 0 | info=“DAG_BACKUP_DATA”;tenant_id=1002;backup_set_id=16;backup_data_type=2;ls_id=1002;turn_id=1;retry_id=0;task_id=50; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-000610A2279F1B90-0-0 | COMPACTION | MDS_TABLE_MERGE | OB_TIMEOUT | WARNING | 2024-08-14 01:37:40.363119 | 2024-08-14 01:37:40.363119 | 0 | info=“DAG_TYPE_MDS_TABLE_MERGE”;ls_id=1002;tablet_id=1152921504622036030;flush_scn=1723569896763204874; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-000610F37F87DF33-0-0 | BACKUP | PREFETCH_BACKUP_INFO | OB_REPLICA_CANNOT_BACKUP | WARNING | 2024-08-22 16:37:02.186903 | 2024-08-22 16:37:02.186903 | 0 | info=“DAG_PREFETCH_BACKUP_INFO”;tenant_id=1002;backup_set_id=63;backup_data_type=1;ls_id=1002;turn_id=1;retry_id=0;task_id=13; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-000611064E6B5F54-0-0 | BACKUP | PREFETCH_BACKUP_INFO | OB_REPLICA_CANNOT_BACKUP | WARNING | 2024-08-24 16:41:28.840891 | 2024-08-24 16:41:28.840891 | 0 | info=“DAG_PREFETCH_BACKUP_INFO”;tenant_id=1002;backup_set_id=64;backup_data_type=1;ls_id=1002;turn_id=1;retry_id=0;task_id=2; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-000611AB6CEC621F-0-0 | BACKUP | PREFETCH_BACKUP_INFO | OB_REPLICA_CANNOT_BACKUP | WARNING | 2024-09-10 16:48:46.611018 | 2024-09-10 16:48:46.611018 | 0 | info=“DAG_PREFETCH_BACKUP_INFO”;tenant_id=1002;backup_set_id=71;backup_data_type=1;ls_id=1002;turn_id=1;retry_id=0;task_id=85; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-000612CEF71A9D82-0-0 | BACKUP | PREFETCH_BACKUP_INFO | OB_REPLICA_CANNOT_BACKUP | WARNING | 2024-10-10 16:16:42.382201 | 2024-10-10 16:16:42.382201 | 0 | info=“DAG_PREFETCH_BACKUP_INFO”;tenant_id=1002;backup_set_id=84;backup_data_type=1;ls_id=1002;turn_id=1;retry_id=0;task_id=11; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-000615589C3F2D51-0-0 | COMPACTION | MAJOR_MERGE | OB_SIZE_OVERFLOW | RETRYED | 2024-11-30 21:17:36.633805 | 2024-12-16 16:06:16.964482 | 538 | info=“DAG_MAJOR_MERGE”;ls_id=1;tablet_id=329;compaction_scn=1732971602166717024;extra_info=“merge_type=“MEDIUM_MERGE””; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-000615589C3F2D77-0-0 | COMPACTION | MINI_MERGE | OB_SIZE_OVERFLOW | RETRYED | 2024-12-01 00:53:57.292093 | 2024-12-16 16:11:05.761313 | 29141 | info=“DAG_MINI_MERGE”;ls_id=1;tablet_id=330;compaction_scn=0;extra_info=“merge_type=“MINI_MERGE””; |
| 81.16.49.44 | 2882 | 1002 | YB425110312C-000615589C3F2D85-0-0 | COMPACTION | MINI_MERGE | OB_SIZE_OVERFLOW | RETRYED | 2024-12-01 00:53:57.154590 | 2024-12-16 16:11:34.253882 | 39139 | info=“DAG_MINI_MERGE”;ls_id=1;tablet_id=329;compaction_scn=0;extra_info=“merge_type=“MINI_MERGE””; |
±------------±---------±----------±----------------------------------±-----------±---------------------±-------------------------------------±--------±---------------------------±---------------------------±----------±-------------------------------------------------------------------------------------------------------------------------------------------+
23 rows in set (0.030 sec)
麻烦 rootservice.log日志和observer.log日志 提供一下