4.x在触发永久下线后不能清空目录重新拉起进程吗

【 使用环境 】生产环境 or 测试环境
【 OB or 其他组件 】
OB在4.x版本中触发永久下线后,删除了该节点上的clog、slog和sstable文件保留了目录结构然后重新拉起了进程,这种情况下发现日志流的同步状态报错并且该副本上的分区数和其他副本也不一致,我记得之前用3.x的版本会自动补齐数据,4.x不具备这样的能力吗 :sweat_smile:
【 使用版本 】
4.2.1

【SOP系列 22 】——故障诊断第一步(自助诊断和诊断信息收集)

正常情况下,只要没超出永久下线时间,拉起后会重新检测clog同步。如果是换磁盘或者服务器节点场景,此时原始数据丢弃,推荐是走删除/添加server或者节点替换流程,加入节点去补齐数据。

删除文件这种场景加入补齐,不太符合官方常规运维操作流程。

我把永久下线的时间调低了,根据__all_virtual_event_history确定触发了永久下线。。主要想知道这种场景下清空文件拉起进程为什么分区数没有补全,这块的机制和3.x有什么不一样的地方吗。。

超出永久下线,节点会被踢出集群,即使拉起也不会加入集群,也不存在自动补齐数据的。

__all_server中没有删除这个节点信息拉起来之后还是加入了集群,但确实也没有补齐数据 。。

看描述是不太符合预期的,使用的是版本呢,我内部测验下看看。

4.2.1.6

好的,我测验看下。

感谢老师:pray:

你好 老师 上面的结论有些不正确( :sweat_smile:),按以下测试结论为准:
1)超出永久下线时间,集群不会删除节点。所做的操作是删除节点副本,并切换leader。

obclient [oceanbase]> select * from DBA_OB_ROOTSERVICE_EVENT_HISTORY order by TIMESTAMP desc limit  100;
+----------------------------+-------------------+--------------------------------+------------+-----------------------+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-----------------------------------+--------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------------------------------------------------------------------------------------------------------------+----------------------+-----------------------------------+---------------------------------------------------------+----------------+-------------+
| TIMESTAMP                  | MODULE            | EVENT                          | NAME1      | VALUE1                | NAME2                | VALUE2                                                                                                                                                                                                                                  | NAME3                | VALUE3                            | NAME4              | VALUE4                                                                                                                                                                                                         | NAME5                      | VALUE5                                                                                                             | NAME6                | VALUE6                            | EXTRA_INFO                                              | RS_SVR_IP      | RS_SVR_PORT |
+----------------------------+-------------------+--------------------------------+------------+-----------------------+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-----------------------------------+--------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------------------------------------------------------------------------------------------------------------+----------------------+-----------------------------------+---------------------------------------------------------+----------------+-------------+
| 2024-05-24 11:35:26.598202 | disaster_recovery | finish_remove_ls_paxos_replica | tenant_id  | 1002                  | ls_id                | 1003                                                                                                                                                                                                                                    | task_id              | YB420BA1CC62-0006192AA91D54CF-0-0 | leader             | "xx.xx.xx.99:2882"                                                                                                                                                                                           | remove_server              | {server:"xx.xx.xx.105:2882", timestamp:1, flag:0, replica_type:0, region:"default_region", memstore_percent:100} | execute_result       | ret:0, OB_SUCCESS; elapsed:54311; | remove permanent offline replica                        | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:35:26.589069 | disaster_recovery | finish_remove_ls_paxos_replica | tenant_id  | 1002                  | ls_id                | 1002                                                                                                                                                                                                                                    | task_id              | YB420BA1CC62-0006192AA91D54CE-0-0 | leader             | "xx.xx.xx.98:2882"                                                                                                                                                                                           | remove_server              | {server:"xx.xx.xx.105:2882", timestamp:1, flag:0, replica_type:0, region:"default_region", memstore_percent:100} | execute_result       | ret:0, OB_SUCCESS; elapsed:54182; | remove permanent offline replica                        | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:35:26.581312 | disaster_recovery | finish_remove_ls_paxos_replica | tenant_id  | 1002                  | ls_id                | 1001                                                                                                                                                                                                                                    | task_id              | YB420BA1CC62-0006192AA91D54CD-0-0 | leader             | "xx.xx.xx.98:2882"                                                                                                                                                                                           | remove_server              | {server:"xx.xx.xx.105:2882", timestamp:1, flag:0, replica_type:0, region:"default_region", memstore_percent:100} | execute_result       | ret:0, OB_SUCCESS; elapsed:55138; | remove permanent offline replica                        | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:35:26.573853 | disaster_recovery | finish_remove_ls_paxos_replica | tenant_id  | 1002                  | ls_id                | 1                                                                                                                                                                                                                                       | task_id              | YB420BA1CC62-0006192AA91D54CC-0-0 | leader             | "xx.xx.xx.98:2882"                                                                                                                                                                                           | remove_server              | {server:"xx.xx.xx.105:2882", timestamp:1, flag:0, replica_type:0, region:"default_region", memstore_percent:100} | execute_result       | ret:0, OB_SUCCESS; elapsed:57380; | remove permanent offline replica                        | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:35:26.563089 | disaster_recovery | finish_remove_ls_paxos_replica | tenant_id  | 1001                  | ls_id                | 1                                                                                                                                                                                                                                       | task_id              | YB420BA1CC62-0006192AA91D54CB-0-0 | leader             | "xx.xx.xx.98:2882"                                                                                                                                                                                           | remove_server              | {server:"xx.xx.xx.105:2882", timestamp:1, flag:0, replica_type:0, region:"default_region", memstore_percent:100} | execute_result       | ret:0, OB_SUCCESS; elapsed:57380; | remove permanent offline replica                        | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:35:26.552679 | disaster_recovery | finish_remove_ls_paxos_replica | tenant_id  | 1                     | ls_id                | 1                                                                                                                                                                                                                                       | task_id              | YB420BA1CC62-0006192AA91D54CA-0-0 | leader             | "xx.xx.xx.98:2882"                                                                                                                                                                                           | remove_server              | {server:"xx.xx.xx.105:2882", timestamp:1, flag:0, replica_type:0, region:"default_region", memstore_percent:100} | execute_result       | ret:0, OB_SUCCESS; elapsed:57884; | remove permanent offline replica                        | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:35:26.546558 | disaster_recovery | disaster_recovery_start        | start_time | 1716521726546557      |                      |                                                                                                                                                                                                                                         |                      |                                   |                    |                                                                                                                                                                                                                |                            |                                                                                                                    |                      |                                   |                                                         | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:35:26.543917 | disaster_recovery | start_remove_ls_paxos_replica  | tenant_id  | 1002                  | ls_id                | 1003                                                                                                                                                                                                                                    | task_id              | YB420BA1CC62-0006192AA91D54CF-0-0 | leader             | "xx.xx.xx.99:2882"                                                                                                                                                                                           | remove_server              | {server:"xx.xx.xx.105:2882", timestamp:1, flag:0, replica_type:0, region:"default_region", memstore_percent:100} | comment              | remove permanent offline replica  |                                                         | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:35:26.534903 | disaster_recovery | start_remove_ls_paxos_replica  | tenant_id  | 1002                  | ls_id                | 1002                                                                                                                                                                                                                                    | task_id              | YB420BA1CC62-0006192AA91D54CE-0-0 | leader             | "xx.xx.xx.98:2882"                                                                                                                                                                                           | remove_server              | {server:"xx.xx.xx.105:2882", timestamp:1, flag:0, replica_type:0, region:"default_region", memstore_percent:100} | comment              | remove permanent offline replica  |                                                         | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:35:26.526190 | disaster_recovery | start_remove_ls_paxos_replica  | tenant_id  | 1002                  | ls_id                | 1001                                                                                                                                                                                                                                    | task_id              | YB420BA1CC62-0006192AA91D54CD-0-0 | leader             | "xx.xx.xx.98:2882"                                                                                                                                                                                           | remove_server              | {server:"xx.xx.xx.105:2882", timestamp:1, flag:0, replica_type:0, region:"default_region", memstore_percent:100} | comment              | remove permanent offline replica  |                                                         | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:35:26.516492 | disaster_recovery | start_remove_ls_paxos_replica  | tenant_id  | 1002                  | ls_id                | 1                                                                                                                                                                                                                                       | task_id              | YB420BA1CC62-0006192AA91D54CC-0-0 | leader             | "xx.xx.xx.98:2882"                                                                                                                                                                                           | remove_server              | {server:"xx.xx.xx.105:2882", timestamp:1, flag:0, replica_type:0, region:"default_region", memstore_percent:100} | comment              | remove permanent offline replica  |                                                         | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:35:26.505729 | disaster_recovery | start_remove_ls_paxos_replica  | tenant_id  | 1001                  | ls_id                | 1                                                                                                                                                                                                                                       | task_id              | YB420BA1CC62-0006192AA91D54CB-0-0 | leader             | "xx.xx.xx.98:2882"                                                                                                                                                                                           | remove_server              | {server:"xx.xx.xx.105:2882", timestamp:1, flag:0, replica_type:0, region:"default_region", memstore_percent:100} | comment              | remove permanent offline replica  |                                                         | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:35:26.494841 | disaster_recovery | start_remove_ls_paxos_replica  | tenant_id  | 1                     | ls_id                | 1                                                                                                                                                                                                                                       | task_id              | YB420BA1CC62-0006192AA91D54CA-0-0 | leader             | "xx.xx.xx.98:2882"                                                                                                                                                                                           | remove_server              | {server:"xx.xx.xx.105:2882", timestamp:1, flag:0, replica_type:0, region:"default_region", memstore_percent:100} | comment              | remove permanent offline replica  |                                                         | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:35:17.595611 | server            | permanent_offline              | server     | "xx.xx.xx.105:2882" |                      |                                                                                                                                                                                                                                         |                      |                                   |                    |                                                                                                                                                                                                                |                            |                                                                                                                    |                      |                                   |                                                         | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:34:27.538115 | root_service      | admin_set_config               | ret        | 0                     | arg                  | {items:[{name:"rootservice_list", value:"xx.xx.xx.98:2882:2881;xx.xx.xx.99:2882:2881", comment:"", zone:"", server:"0.0.0.0:0", tenant_name:"", exec_tenant_id:1, tenant_ids:[], want_to_set_tenant_config:false}], is_inner:false} |                      |                                   |                    |                                                                                                                                                                                                                |                            |                                                                                                                    |                      |                                   |                                                         | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:34:27.528222 | server            | last_offline_time set          | server     | "xx.xx.xx.105:2882" |                      |                                                                                                                                                                                                                                         |                      |                                   |                    |                                                                                                                                                                                                                |                            |                                                                                                                    |                      |                                   |                                                         | xx.xx.xx.98  |        2882 |
| 2024-05-24 11:34:27.521227 | server            | lease_expire                   | server     | "xx.xx.xx.105:2882" |                      |                                                                                                                                                                                                                                         |                      |                                   |                    |                                                                                                                                                                                                                |                            |                                                                                                                    |                      |                                   |                                                         | xx.xx.xx.98  |        2882 |
+----------------------------+-------------------+--------------------------------+------------+-----------------------+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-----------------------------------+--------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------------------------------------------------------------------------------------------------------------+----------------------+-----------------------------------+---------------------------------------------------------+----------------+-------------+
100 rows in set (0.001 sec)

2)重新拉起节点,会自动补齐副本,测试删除数据文件 和 clog目录,也会重新补齐。

1 个赞

4.x 也会自动补齐?我这边实测发现长时间没有补齐。。还有个行为我忘记说了,我是在集群正常的情况下先删除了一台机器上的clog下的所有目录,leader立马切走然后才让这台机器触发了永久下线,后面重建了clog下的目录拉起进程才发现没有补齐
目前值得怀疑的地方可能是我创建的目录少了,我只创建了该zone中缺少的日志流id目录,我在去复现下试试。。感谢秃蛙老师

你这最终补齐数据了吗

能补齐

我有一个测试集群,其中一个节点网线被弄掉了。已经过了一天多时间了,看了日志已经有租约过期,和永久下线的信息。为啥插上网线后,过了会他就自动加入了呢。版本4.3.2.0的

估计是服务器的信息没有在集群中删除吧,如果__all_server中还有该机器的信息那么机器起来之后重新加入集群是正常的

我还想要找资料把他看怎么恢复呢,自己就恢复了。就觉得奇怪,都报永久下线了,还能回来

触发永久下线只是会让该机器上的副本不可用然后迁移,不代表机器不能重新加入集群