OceanBase启动超时

【 使用环境 】VMware中Centos 7 Minimal 2009 2核8G 40内存
【 OB or 其他组件 】无
【 使用版本 】OceanBasev4.3.0社区版
【问题描述】OceanBase启动超时(Job for oceanbase.service failed because a timeout was exceeded.)
【事件原因】仅启动sudo systemctl start oceanbase
【附件及日志】systemctl status oceanbase.service -l显示:
oceanbase.service - oceanbase
Loaded: loaded (/etc/systemd/system/oceanbase.service; disabled; vendor preset: disabled)
Active: failed (Result: timeout) since 五 2024-04-26 14:33:28 CST; 2min 55s ago
Main PID: 5381
CGroup: /system.slice/oceanbase.service
├─2777 /bin/bash /home/admin/oceanbase/profile/oceanbase-service.sh start
├─3334 /home/admin/oceanbase/bin/obshell daemon --ip 127.0.0.1 --port 2886
├─3349 /home/admin/oceanbase/bin/obshell server --ip 127.0.0.1 --port 2886
├─3457 /home/admin/oceanbase/bin/observer -d /home/admin/oceanbase/store -p 2881 -n ob -z zone1 -I 127.0.0.1 -P 2882 -c 1 -o cpu_count=0,datafile_next=2G,max_syslog_file_count=4,__min_full_resource_pool_memory=1073741824,enable_syslog_wf=false,system_memory=1G,datafile_maxsize=20G,enable_syslog_recycle=true,memory_limit=6G,datafile_size=2G,log_disk_size=13G
├─4399 /bin/bash /home/admin/oceanbase/profile/oceanbase-service.sh start
├─5381 /bin/bash /home/admin/oceanbase/profile/oceanbase-service.sh start
├─6406 sleep 30
├─6411 sleep 30
└─6416 sleep 30

4月 26 14:34:49 localhost.localdomain bash[2777]: Observer process with PID 3457 is still running.
4月 26 14:34:56 localhost.localdomain bash[5381]: Observer process with PID 3457 is still running.
4月 26 14:35:08 localhost.localdomain bash[4399]: Observer process with PID 3457 is still running.
4月 26 14:35:19 localhost.localdomain bash[2777]: Observer process with PID 3457 is still running.
4月 26 14:35:26 localhost.localdomain bash[5381]: Observer process with PID 3457 is still running.
4月 26 14:35:38 localhost.localdomain bash[4399]: Observer process with PID 3457 is still running.
4月 26 14:35:50 localhost.localdomain bash[2777]: Observer process with PID 3457 is still running.
4月 26 14:35:56 localhost.localdomain bash[5381]: Observer process with PID 3457 is still running.
4月 26 14:36:08 localhost.localdomain bash[4399]: Observer process with PID 3457 is still running.
4月 26 14:36:20 localhost.localdomain bash[2777]: Observer process with PID 3457 is still running.

journalctl -xe显示:
4月 26 14:45:28 localhost.localdomain polkitd[666]: Unregistered Authentication Agent for unix-process:6733:391491 (system bus name :1.48, object path /org/freedesktop/PolicyKit1/Authenti
4月 26 14:45:39 localhost.localdomain bash[4399]: Observer process with PID 3457 is still running.
4月 26 14:45:50 localhost.localdomain bash[2777]: Observer process with PID 3457 is still running.
4月 26 14:45:55 localhost.localdomain bash[6739]: Observer process with PID 3457 is still running.
4月 26 14:45:57 localhost.localdomain bash[5381]: Observer process with PID 3457 is still running.
4月 26 14:46:05 localhost.localdomain chronyd[669]: Selected source 139.199.214.202
4月 26 14:50:09 localhost.localdomain bash[4399]: Observer process with PID 3457 is still running.
4月 26 14:50:12 localhost.localdomain dhclient[6481]: DHCPREQUEST on ens32 to 192.168.我的ip port 67 (xid=0x6e328a08)

ps -ef |grep observer 看看
Observer process with PID 3457 is still running.

[root@localhost ~]# ps -ef |grep observer
root 2178 1 99 17:26 ? 00:01:50 /home/admin/oceanbase/bin/observer
root 2958 1531 0 17:28 pts/0 00:00:00 grep --color=auto observer
老哥,这样的

  1. 可以通过kill -9 把observer杀掉 再重启呢
  2. 如果报错提供下日志呢

通过systemctl status oceanbase.service -l查询

4月 26 17:46:46 localhost.localdomain bash[4121]: Observer process with PID 4689 is still running.
4月 26 17:47:01 localhost.localdomain systemd[1]: oceanbase.service start operation timed out. Terminating.
4月 26 17:47:01 localhost.localdomain systemd[1]: Failed to start oceanbase.
4月 26 17:47:01 localhost.localdomain systemd[1]: Unit oceanbase.service entered failed state.
4月 26 17:47:01 localhost.localdomain systemd[1]: oceanbase.service failed.
4月 26 17:47:16 localhost.localdomain bash[4121]: Observer process with PID 4689 is still running.
4月 26 17:47:46 localhost.localdomain bash[4121]: Observer process with PID 4689 is still running.
4月 26 17:48:16 localhost.localdomain bash[4121]: Observer process with PID 4689 is still running.
4月 26 17:48:46 localhost.localdomain bash[4121]: Observer process with PID 4689 is still running.
4月 26 17:49:16 localhost.localdomain bash[4121]: Observer process with PID 4689 is still running.
通过journalctl -xe
4月 26 17:46:03 localhost.localdomain bash[4121]: the response state is RUNNING
4月 26 17:46:03 localhost.localdomain bash[4121]: wait 6s and the retry
4月 26 17:46:09 localhost.localdomain bash[4121]: the response state is RUNNING
4月 26 17:46:09 localhost.localdomain bash[4121]: wait 6s and the retry
4月 26 17:46:15 localhost.localdomain bash[4121]: the response state is SUCCEED
4月 26 17:46:15 localhost.localdomain bash[4121]: request successfully
4月 26 17:46:16 localhost.localdomain bash[4121]: Failed to notify init system: Permission denied
4月 26 17:46:16 localhost.localdomain bash[4121]: Observer process with PID 4689 is still running.
4月 26 17:46:46 localhost.localdomain bash[4121]: Observer process with PID 4689 is still running.
4月 26 17:47:01 localhost.localdomain systemd[1]: oceanbase.service start operation timed out. Terminating.
4月 26 17:47:01 localhost.localdomain systemd[1]: Failed to start oceanbase.
– Subject: Unit oceanbase.service has failed
– Defined-By: systemd
– Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

– Unit oceanbase.service has failed.

– The result is failed.
4月 26 17:47:01 localhost.localdomain systemd[1]: Unit oceanbase.service entered failed state.
4月 26 17:47:01 localhost.localdomain systemd[1]: oceanbase.service failed.
4月 26 17:47:01 localhost.localdomain polkitd[662]: Unregistered Authentication Agent for unix-process:4115:123237 (system bus name :1.29, object path /org/freedesktop/PolicyKit1/Authenti
4月 26 17:47:01 localhost.localdomain sudo[4113]: pam_unix(sudo:session): session closed for user root
4月 26 17:47:09 localhost.localdomain chronyd[671]: Selected source 185.209.85.222
4月 26 17:47:09 localhost.localdomain chronyd[671]: System clock wrong by 3351.659125 seconds, adjustment started
4月 26 17:47:16 localhost.localdomain bash[4121]: Observer process with PID 4689 is still running.
4月 26 17:47:46 localhost.localdomain bash[4121]: Observer process with PID 4689 is still running.

然后对于日志
observer.log(这个里面东西太多了光是那几个时间的就成百上千条)
[2024-04-26 17:56:15.997984] INFO [STORAGE] write_meta_block (ob_index_block_builder.cpp:2134) [4952][T1_MINI_MERGE][T1][YB427F000001-000616FCC9E8FF4A-0-0] [lt=5] succeed to write macro meta in macro block(ret=0, macro_meta_={val:{version:1, length:172, data_checksum:2967856324, rowkey_count:6, column_count:38, micro_block_count:12, occupy_size:197052, data_size:195779, data_zsize:195779, original_size:195779, progressive_merge_round:0, block_offset:197052, block_size:2264, row_count:1136, row_count_delta:1136, max_merged_trans_version:1714125373559504887, is_encrypted:false, is_deleted:false, contain_uncommitted_row:false, compressor_type:1, master_key_id:0, encrypt_id:0, encrypt_key:"", row_store_type:0, schema_version:1714112256151672, snapshot_version:1714125373355880351, is_last_row_last_flag:true, logic_id:{data_seq:{data_seq:0, parallel_idx:0, block_type:0, merge_type:0, sstable_logic_seq:0, reserved:0, sign:0, macro_data_seq:0}, logic_version:1714125373559504887, tablet_id:115, column_group_idx:0}, macro_id:70, column_checksums:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], has_string_out_row:false, all_lob_in_row:false, agg_row_len:0, agg_row_buf:null}, end_key:{datum_cnt:6, group_idx:0, hash:0, [idx=0:{len: 8, flag: 0, null: 0, ptr: 0x7f5fd2c18070, hex: 0000000000000000, int: 0},idx=1:{len: 8, flag: 0, null: 0, ptr: 0x7f5fd2c180a8, hex: 1A54000000000000, int: 21530},idx=2:{len: 8, flag: 0, null: 0, ptr: 0x7f5fd2c180e0, hex: 1100000000000000, int: 17},idx=3:{len: 8, flag: 0, null: 0, ptr: 0x7f5fd2c181b0, hex: 40B8FDE9FC160600, int: 1714125373552704},idx=4:{len: 8, flag: 0, null: 0, ptr: 0x7f5fd2c18150, hex: 098080F80D3436E8, int: -1714125373559504887},idx=5:{len: 8, flag: 0, null: 0, ptr: 0x7f5fd2c18188, hex: 0000000000000000, int: 0},]store_rowkey:}})
[2024-04-26 17:56:15.998042] INFO [STORAGE.BLKMGR] async_write (ob_macro_block_handle.cpp:185) [4952][T1_MINI_MERGE][T1][YB427F000001-000616FCC9E8FF4A-0-0] [lt=25] Async write macro block(macro_id_=70)
[2024-04-26 17:56:15.998047] INFO [STORAGE] flush (ob_macro_block.cpp:400) [4952][T1_MINI_MERGE][T1][YB427F000001-000616FCC9E8FF4A-0-0] [lt=4] macro block writer succeed to flush macro block.(block_id=70, common_header_={header_size:24, version:1, magic:1001, attr:1, payload_size:199587, payload_checksum:-1780781161}, macro_header_={fixed_header:{header_size:481, version:2, magic:1007, tablet_id:115, logical_version:1714125373559504887, data_seq:0, column_count:38, rowkey_column_count:6, row_store_type:0, row_count:1136, occupy_size:197052, micro_block_count:12, micro_block_data_offset:505, micro_block_data_size:196547, idx_block_offset:197052, idx_block_size:2264, meta_block_offset:199316, meta_block_size:295, data_checksum:2967856324, compressor_type:1, encrypt_id:0, master_key_id:0, encrypt_key:“data_size:16, data:00000000000000000000000000000000”, col_type_array_cnt:6}, column_types:0x7f5fd3604140, column_orders:0x7f5fd3604158, column_checksum:0x7f5fd3604170, is_normal_cg:false, column_checksum:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}, contain_uncommitted_row=false, max_merged_trans_version=1714125373559504887, &macro_handle=0x7f601b181bc8, table_cg_idx=0)
[2024-04-26 17:56:15.998951] INFO [STORAGE.TRANS] get_number (ob_id_service.cpp:389) [4997][T1_TenantWeakRe][T1][Y0-0000000000000000-0-0] [lt=37] get number(ret=0, service_type_=0, range=1, base_id=1714125375998950100, start_id=1714125375998950100, end_id=1714125375998950101)
[2024-04-26 17:56:16.015171] INFO [STORAGE] wait_io_finish (ob_macro_block_writer.cpp:1462) [4915][T1_MINI_MERGE][T1][YB427F000001-000616FCC9E8FF4B-0-0] [lt=22] wait io finish(macro_handle.get_macro_id()=68, data_store_desc_->get_table_cg_idx()=0, is_normal_cg=false)
Entering Ex mode. Type “visual” to go to Normal mode.
:blush:
[2024-04-26 18:00:30.662393] WDIAG [SERVER] runTimerTask (ob_server.cpp:3290) [4690][ServerGTimer][T0][Y0-0000000000000000-0-0] [lt=5][errcode=-4000] ObRefreshNetworkSpeedTask reload band
width throttle limit failed(ret=-4000, ret=“OB_ERROR”)

observer.log.wf
[2024-04-26 14:16:39.259277] WARN init_config (ob_server.cpp:1896) [3457][observer][T0][Y0-0000000000000000-0-0] [lt=4][errcode=-4187] Item not match(the devname has been rewritten, and the new value comes from local_ip, old value=“lo”, new value=“lo”, local_ip=“127.0.0.1”)
[2024-04-26 17:27:13.553189] INFO New syslog file info: [address: “127.0.0.1:2882”, observer version: OceanBase_CE 4.3.0.1, revision: 100000242024032211-0193a343bc60b4699ec47792c3fc4ce166a182f9, sysname: Linux, os release: 3.10.0-1160.el7.x86_64, machine: x86_64, tz GMT offset: 08:00]
[2024-04-26 17:56:15.980874] INFO New syslog file info: [address: “127.0.0.1:2882”, observer version: OceanBase_CE 4.3.0.1, revision: 100000242024032211-0193a343bc60b4699ec47792c3fc4ce166a182f9, sysname: Linux, os release: 3.10.0-1160.el7.x86_64, machine: x86_64, tz GMT offset: 08:00]

  1. obd cluster list 可以看到集群状态。 是否是running 是表示集群正常,无需再启动oceanbase
  2. 如果不是running可以通过【SOP 系列 19】OceanBase 生态组件重启方式 - 社区问答- OceanBase社区-分布式数据库