ob start 报错

v4.0.0
问题:deploy后 start报错 , 试过destroy 后重新部署, edit-config添加__min_full_resource_pool_memory=4831838208,调整system_memory ,都不行报错如下:

[2022-11-22 17:51:43.441624] ERROR [SHARE] operator() (ob_common_config.cpp:124) [103988][][T0][Y0-0000000000000000-0-0] [lt=17] Invalid config, value out of range(name=“min_full_resource_pool_memory", value=“268435456”, ret=-4147) BACKTRACE:0xb553efb 0xb5459d6 0x3c3bfda 0x3c3bcf9 0x3c3bb00 0x3c3b952 0x9a21ee4 0x99fa59f 0x99f9d88 0x5bd2d95 0x5bd1386 0x3c173fc 0x2b0e3afce3d5 0x3c16184
[2022-11-22 17:51:43.442195] ERROR [SERVER] init_config (ob_server.cpp:1275) [103988][][T0][Y0-0000000000000000-0-0] [lt=554] invalid config from cmdline options(opts
.optstr
=”__min_full_resource_pool_memory=268435456,memory_limit=24G,datafile_disk_percentage=20,enable_syslog_wf=False,enable_syslog_recycle=True,max_syslog_file_count=4", ret=-4147, ret=“OB_INVALID_CONFIG”) BACKTRACE:0xb553efb 0xb5459d6 0x3c3bfda 0x3c3bcf9 0x3c3bb00 0x3c3b952 0x5bf1552 0x5bd39cf 0x5bd1386 0x3c173fc 0x2b0e3afce3d5 0x3c16184
[2022-11-22 17:51:43.443301] ERROR [SERVER] init (ob_server.cpp:178) [103988][][T0][Y0-0000000000000000-0-0] [lt=5] init config failed(ret=-4147, ret=“OB_INVALID_CONFIG”) BACKTRACE:0xb553efb 0xb5459d6 0x3c42f9b 0x3c42cb4 0x3c42ac9 0x3c2137b 0x5bd437d 0x5bd1ebb 0x3c173fc 0x2b0e3afce3d5 0x3c16184

[2022-11-22 17:51:43.445905] ERROR [SERVER] init (ob_server.cpp:374) [103988][][T0][Y0-0000000000000000-0-0] [lt=4] [OBSERVER_NOTICE] fail to init observer(ret=-4147, ret=“OB_INVALID_CONFIG”) BACKTRACE:0xb553efb 0xb5459d6 0x3c42f9b 0x3c42cb4 0x3c42ac9 0x3c2137b 0x5bdd615 0x5bd1f27 0x3c173fc 0x2b0e3afce3d5 0x3c16184
[2022-11-22 17:51:43.445938] ERROR [SERVER] main (main.cpp:529) [103988][][T0][Y0-0000000000000000-0-0] [lt=32] observer init fail(ret=-4147) BACKTRACE:0xb553efb 0xb5459d6 0x3c1bfe0 0x3c1bd09 0x3c1bb28 0x3c163dc 0x3c1840d 0x3c17645 0x2b0e3afce3d5 0x3c16184

部分配置:
memory_limit: 24G
system_memory: 5G
datafile_disk_percentage: 10
syslog_level: INFO
enable_syslog_wf: false
enable_syslog_recycle: true
max_syslog_file_count: 4
appname: obcluster1

您好,方便传下完整日志吗?可以以附件形式

observer.log (101.8 KB)
observer.log.rar (1.5 KB)

obd的版本是多少?如果不是1.6或更高的话,可以先升级一下obd
sudo obd update

离线怎么升级

可以在有线的环境中下载最新的RPM包,上传到自己的环境,然后把旧的obd先删掉,然后再安装新的包

[admin@crmhist1 ~]$ obd cluster start obmysql1
Get local repositories ok
Search plugins ok
Open ssh connection ok
Load cluster param plugin ok
Check before start observer ok
[WARN] OBD-1007: (10.161.69.183) The recommended number of max user processes is 20480 (Current value: 4096)
[WARN] (10.161.69.183) clog and data use the same disk (/oradata)
[WARN] OBD-1007: (10.161.69.184) The recommended number of max user processes is 20480 (Current value: 4096)
[WARN] (10.161.69.184) clog and data use the same disk (/oradata)
[WARN] OBD-1007: (10.161.69.185) The recommended number of max user processes is 20480 (Current value: 4096)
[WARN] (10.161.69.185) clog and data use the same disk (/oradata)
[WARN] OBD-1007: (10.161.69.186) The recommended number of max user processes is 20480 (Current value: 4096)
[WARN] (10.161.69.186) clog and data use the same disk (/oradata)
[WARN] OBD-1007: (10.161.69.187) The recommended number of max user processes is 20480 (Current value: 4096)
[WARN] (10.161.69.187) clog and data use the same disk (/oradata)

Check before start obproxy ok
Start observer ok
observer program health check ok
Connect to observer x
[ERROR] OBD-1006: Failed to connect to oceanbase-ce
See https://www.oceanbase.com/product/ob-deployer/error-codes .
能起来了 ,最后连接报错

[2022-11-22 20:08:45.573015] INFO [RPC] ~ObBatchRpcBase (ob_batch_rpc.h:386) [167492][][T0][Y0-0000000000000000-0-0] [lt=5] ObBatchRpcBase destroy finished
[2022-11-22 20:08:45.579587] INFO unregister_pm (ob_page_manager.cpp:48) [167551][EvtHisUpdTask][T0][Y0-0000000000000000-0-0] [lt=6] unregister pm finish(&pm=0x2b4799a597c0, pm.get_tid()=167551)
[2022-11-22 20:08:45.583050] ERROR has_unfree_callback (object_set.cpp:34) [167492][][T0][Y0-0000000000000000-0-0] [lt=4] HAS UNFREE PTR!!! context: 0x2b47773fc6b0, label: SchemaSysCache, static_id: 0x1170ab98, static_info:{filename: ob_schema_cache.cpp, line: 467, function: init}, dynamic_info:{tid: 167492, cid: 0, create_time: 1669118850141618} BACKTRACE:0xb553efb 0xb5459d6 0x3c18d6f 0x3c18a57 0x3c18841 0x3c2ae18 0x3c2ad8d 0x3972dfe 0x392a063 0x3c1fbeb 0x392a578 0x985027b 0x97a71e2 0x2b4763d0eb69 0x2b4763d0ebb7 0x2b4763cf73dc 0x3c16184
[2022-11-22 20:08:45.583235] WARN [COMMON] deregister_cache (ob_kv_storecache.cpp:594) [167492][][T0][Y0-0000000000000000-0-0] [lt=178] The ObKVGlobalCache has not been inited, (ret=-4006)
[2022-11-22 20:08:45.583256] WARN [COMMON] deregister_cache (ob_kv_storecache.cpp:594) [167492][][T0][Y0-0000000000000000-0-0] [lt=9] The ObKVGlobalCache has not been inited, (ret=-4006)
[2022-11-22 20:08:45.595291] INFO unregister_pm (ob_page_manager.cpp:48) [167620][ClockGenerator][T0][Y0-0000000000000000-0-0] [lt=5] unregister pm finish(&pm=0x2b47b5a597c0, pm.get_tid()=167620)

destroy后重启deploy 也是这错

发下你的yaml配置文件

yaml.txt (1.6 KB)

日志中是有错误的,但是有效的错误信息没有贴出来,可以找一下。
另外,配置文件中 appname: obmysql1 要与 obproxy 中的 cluster_name: obcluster1 保持一致

执行start时的日志就那些,我在改下cluster_name 一会试试,另外有几个问题
1、我又下载了3.1.4 版deploy报磁盘空间不足,检查发现ob的 clog sst文件以及跑了8个T了,只有昨晚部署的这个4.0,一直都没start成功


2、deploy后貌似obd edit-config 保存后没有作用,昨晚发现几处disck_percentage参数改完后 又变回去了
3、 obd list那些已经destroy的集群信息 如何清理 不显示

log.rar (714.7 KB)

部署 start 3.1.4时的log,也是报这错

没人看了!

3.1.4 的代码部署上有很多不方便的地方,建议还是4.0上手,可以配置clog和数据文件的大小。如果是3.x的版本,默认是按照磁盘的百分比来控制存储的,默认情况下会把磁盘用满。
最开始4.0报错日志那个问题,可以重现一下把observer.log 当附件放上来吧,你的日志肯定是没有贴完的,要不然是一个严重BUG了。

‘3.1.4 的代码部署上有很多不方便的地方’ 是什么意思? 系统我可以都destroy后重新把所有log弄下来,如果同样的错误我需要弄哪些节点的哪些日志?

重新部署了4.0 错误一样,参数文件 见前面上传的文件,上传了所有节点的log文件
log.tar.gz (2.7 MB)

起不来的原因是memory_limit’ or 'system_memory’没获取对;能否发一下yaml文件;
[2022-11-24 17:32:47.425104] ERROR [SHARE] get_sys_tenant_default_memory (ob_unit_resource.cpp:702) [93313][][T0][Y0-0000000000000000-0-0] [lt=6] server available memory is little than unit min memory, can not create sys tenant. try adjust config ‘memory_limit’ or ‘system_memory’.(ret=-4147, ret=“OB_INVALID_CONFIG”, unit_min_memory=1073741824, server_avail_memory=-6442450944, system_memory=32212254720, server_memory_limit=25769803776) BACKTRACE:0xb553efb 0xb5459d6 0x3c820ac 0x3c81daf 0x3c81bad 0x3c819f7 0xa895a9b 0xa8954a4 0xa88a140 0xa88a048 0x45b5b5b 0x45b6752 0x5be2043 0x3c17413 0x2b065d7693d5 0x3c16184

yaml文件我看到了,应该是配置问题,只配置了memory_limit,system_memory没配置
memory_limit: 24G

起不来的原因:只配置了memory_limit 30G,没配置system_memory,那么system_memory用了默认值30G,system memory内存配的比memory limit大,这种情况下肯定是起不来的;

建议增加system_memory的配置,配成8G;重新启动;

image

这还不到1个小时 有8T了,绝对bug

重新部署启动还是报错,上来就是8.3T

image