生产环境 3.1.4-OceanBase CE ,5+5+1(log)架构
线上有个集群停止log archive后,长期处于stopping状态.
mysql> select gmt_create,gmt_modified,min_first_time,status from oceanbase.__all_backup_log_archive_status_v2;
±---------------------------±---------------------------±---------------------------±----------+
| gmt_create | gmt_modified | min_first_time | status |
±---------------------------±---------------------------±---------------------------±----------+
| 2022-08-29 11:26:56.187568 | 2023-06-09 21:57:47.256685 | 2023-05-26 05:18:15.128498 | STOPPING |
| 2023-05-26 05:18:15.673265 | 2023-05-26 05:18:15.673265 | 2022-08-29 14:22:55.891223 | BEGINNING |
| 2023-05-26 05:18:15.858877 | 2023-05-26 05:18:15.858877 | 2022-08-29 14:23:32.727618 | BEGINNING |
±---------------------------±---------------------------±---------------------------±----------+
#rs日志:
[2023-08-03 14:56:11.453512] ERROR [RS] check_mount_file_ (ob_log_archive_scheduler.cpp:1228) [131256][672][YB420A3AEEDF-0005F7EBC87B0215]
[lt=54] [dc=0] failed to check_mount_file(ret=-9051, sys_info={status:{tenant_id:1, copy_id:0, start_ts:1685049495128498, checkpoint_ts:0, status:4, incarnation:1, round:8, status_str:“STOPPING”, is_mark_deleted:false, is_mount_file_created:true, compatible:1, backup_piece_id:205, start_piece_id:205}, backup_dest:“file:///xxxx/commbackup/”})
BACKTRACE:0x9a98e9e 0x986d141 0x22e245f 0x22e20ab 0x22e1e72 0x38a8f2c 0x70d07cf 0x70ca92b 0x70c987b 0x6750a72 0x9a2aabd 0x9a2a4ee 0x340b9af 0x2cabf02 0x9820da5 0x981f792 0x981c24f
stoping状态已经有2个月了,
#1、尝试了 ALTER SYSTEM CANCEL ALL BACKUP FORCE; 无效
#2、尝试了逐台重启集群所有observer节点 无效
#3、尝试了切rs/手工重新挂在oss目录等 无效
#4、确认select svr_ip, log_archive_status, count(*) from __all_virtual_pg_backup_log_archive_status group by svr_ip, log_archive_status;结果集log_archive_status 都为1
该集群archive log目前 hang住不可操作了
问题:
1、是否有办法让stoping状态的log archive调度彻底停下来,再重开?