主副本机器内存突然飙升,然后系统自动将observer进程给kill了。 重启之后在ocp中sql诊断查询无数据

【 使用环境 】生产环境
【 OB or 其他组件 】oceanbase、ocp
【 使用版本 】oceanbase:4.2.1-8BP、 ocp:4.2.2.0
【问题描述】
集群是1-1-1.

问题租户: 凌晨一点的时候, 内存突然飙升,主副本挂掉了。重新启动,数据归零了。重新同步的数据。

凌晨2点的时候,ocp报内存不足等异常,将ocp所在服务器升级配置,重启了。

然后后面这个租户就查询不到sql诊断信息了。 只能查到4点之前的诊断信息。

猜测主副本挂掉,是因为执行了一个大表的全表更新sql导致的。 这台机器 512G内存。 分配给ob使用的为470G


4 个赞

1.问题租户: 凌晨一点的时候, 内存突然飙升,主副本挂掉了。重新启动,数据归零了。重新同步的数据。

这里 数据归零了 是指什么?内存突然飙升 是指业务集群还是ocp的数据库集群?

2.凌晨2点的时候,ocp报内存不足等异常,将ocp所在服务器升级配置,重启了。
然后后面这个租户就查询不到sql诊断信息了。 只能查到4点之前的诊断信息。

是ocp的数据库集群内存不足吗?这个租户是指的什么租户?ocp所在服务器升级配置重启后 这个租户4点以后的sql诊断信息一直查不到吗?

3.ocp数据库集群是和业务集群用的同一个吗?

4.麻烦发下异常时段所有的observer.log

3 个赞

ocp 的 SQL 诊断 没有数据 - #11,来自 洪波1vup091_gaOTM1MTI2NzkwLjE3MDkwOTI3MDQ._ga_T35KTM57DZ*MTcyNjI5MzM0My40MTAuMS4xNzI2Mjk2MzI4LjU4LjAuMA…

参考了这个帖子的操作。 sql诊断恢复正常了

但是observer为啥会超出内存限制,导致云服务器自动杀死了observer这个问题还没解决···

而且重启observer。当前observer的数据全被清掉,重新进行了漫长的数据同步动作···。不知道这个是不是正常的···

3 个赞

我们分析下,有进展回复你

4 个赞

业务租户主副本,挂掉前, 执行了一个大表的全表update,初始化一个表字段数据。
这个表在mysql大概有9G数据量。

业务租户每个副本分配了470G内存

4 个赞

sql 3.zip (1.7 KB)

麻烦提供下附件sql的查询结果
上述“ocp内存不足”的截图看能否提供下

3 个赞

ocp_monitor租户的MEMStore 为啥一直不释放

3 个赞


找到问题了。 之前这个值设置了72G。导致一直没有动态扩容。数据文件占用的磁盘空间满了。 导致一直无法转储。

但72G这个值是ocp搭建的时候自动设置的值。。 一直没有关注过。也是给埋了个小坑吧

4 个赞

业务集群observer.log在2024-09-14 01:13:18后就突然不在打印日志了,2024-09-14 02:29:04 observer启动,开始打印日志,在日志中断的时间点没有异常信息打出来,observer.log中没有OOM,OOPS之类的报错以及占用内存较大的模块信息打印出来,看起来是在2024-09-14 01:13:18机器直接宕掉了,从ocp监控的内存使用率看内存上升的不高且远没有耗尽,怀疑是否其它问题导致机器挂掉的

麻烦发下当时的message日志,以及是否有core文件?

1 个赞

ocp是怎么部署的呢,正常OBD部署的会有datafile_size,datafile_maxsize,datafile_next参数,
即只要磁盘空间足够并且datafile_size<datafile_maxsize,就会自动按照datafile_next的大小自动扩容,前提是datafile_next不能取默认值0.

ocp是通过白屏部署的。自动设置了log_disk_size 为72G

看这个参数修改历史, 我应该是有纠正过为110G的。 后面不知道为什么自动改回了72G

9.19 16:16 有做过什么操作吗?比如重启observer

重启过ocp。 当时升级了ocp服务器的内存

麻烦截图看下当前值

messages-20240915.zip (47.1 KB)

没有core文件

今天凌晨再次发生了这个问题。 有2台zone挂了···

message日志
messages.zip (11.2 KB)

core文件
core-argusagent-1869-1726775176.zip (2.6 MB)

只有其中一个zone的服务器。生成了core文件


这一台服务器没宕机,应该只是observer进程被kill了

这台服务器生成的message日志如下
messages_212.log (201.6 KB)

Sep 13 23:20:01 iZbp1j5usxm0z3abrvriquZ systemd: Removed slice User Slice of root.
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: ocp_monagent invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0

--ocp_monagent触发OOM

Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: ocp_monagent cpuset=ocp_monagent mems_allowed=0
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: CPU: 0 PID: 14472 Comm: ocp_monagent Tainted: G OE ------------ T 3.10.0-1160.119.1.el7.x86_64 #1
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 2221b89 04/01/2014
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: Call Trace:
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [] dump_stack+0x19/0x1f
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [] dump_header+0x90/0x22d
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [] ? find_lock_task_mm+0x56/0xd0
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x70
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [] oom_kill_process+0x2d5/0x4a0
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [] mem_cgroup_oom_synchronize+0x55c/0x590
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [] pagefault_out_of_memory+0x14/0x90
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [] mm_fault_error+0x6a/0x15b
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [] __do_page_fault+0x4a1/0x510
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [] trace_do_page_fault+0x56/0x150
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [] do_async_page_fault+0x22/0x100
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [] async_page_fault+0x28/0x30
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: Task in /ocp_agent/ocp_monagent killed as a result of limit of /ocp_agent/ocp_monagent
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: memory: usage 2097152kB, limit 2097152kB, failcnt 19006
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: memory+swap: usage 2097152kB, limit 9007199254740988kB, failcnt 0
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: Memory cgroup stats for /ocp_agent/ocp_monagent: cache:72KB rss:2097080KB rss_huge:589824KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:2096884KB inactive_file:12KB active_file:4KB unevictable:0KB
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: [14455] 0 14455 778936 528189 1074 0 0 ocp_monagent
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: Memory cgroup out of memory: Kill process 14479 (ocp_monagent) score 979 or sacrifice child
Sep 13 23:24:17 iZbp1j5usxm0z3abrvriquZ kernel: Killed process 14455 (ocp_monagent), UID 0, total-vm:3115744kB, anon-rss:2098616kB, file-rss:14140kB, shmem-rss:0kB
Sep 13 23:30:01 iZbp1j5usxm0z3abrvriquZ systemd: Created slice User Slice of root.

……

Sep 14 01:10:01 iZbp1j5usxm0z3abrvriquZ systemd: Removed slice User Slice of root.
Sep 14 01:13:41 iZbp1j5usxm0z3abrvriquZ kernel: T1004_L0_G0 invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0

--T1004_L0_G0 触发 OOM

Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: T1004_L0_G0 cpuset=/ mems_allowed=0
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: CPU: 1 PID: 19483 Comm: T1004_L0_G0 Tainted: G OE ------------ T 3.10.0-1160.119.1.el7.x86_64 #1
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 2221b89 04/01/2014
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Call Trace:
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] dump_stack+0x19/0x1f
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] dump_header+0x90/0x22d
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] ? ktime_get_ts64+0x52/0xf0
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] ? delayacct_end+0x8f/0xc0
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] ? virtballoon_oom_notify+0x2a/0x80 [virtio_balloon]
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] oom_kill_process+0x2d5/0x4a0
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] ? oom_unkillable_task+0xcd/0x120
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] out_of_memory+0x31a/0x500
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] __alloc_pages_nodemask+0xae4/0xbf0
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] alloc_pages_current+0x98/0x110
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] __page_cache_alloc+0x97/0xb0
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] filemap_fault+0x270/0x420
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] ext4_filemap_fault+0x36/0x50 [ext4]
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] __do_fault.isra.61+0x8a/0x100
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] do_read_fault.isra.63+0x4c/0x1b0
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] handle_mm_fault+0xa33/0x1190
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] __do_page_fault+0x213/0x510
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] trace_do_page_fault+0x56/0x150
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] do_async_page_fault+0x22/0x100
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [] async_page_fault+0x28/0x30
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Mem-Info:
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: active_anon:126930586 inactive_anon:115 isolated_anon:0#012 active_file:6232 inactive_file:5513 isolated_file:223#012 unevictable:0 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:19125 slab_unreclaimable:51171#012 mapped:6518 shmem:182 pagetables:249896 bounce:0#012 free:1032166 free_pcp:3918 free_cma:0
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Node 0 DMA free:15908kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: lowmem_reserve[]: 0 2782 507713 507713
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Node 0 DMA32 free:2027792kB min:11492kB low:14364kB high:17236kB active_anon:765776kB inactive_anon:36kB active_file:264kB inactive_file:152kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3111612kB managed:2849436kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:40kB slab_reclaimable:680kB slab_unreclaimable:1288kB kernel_stack:368kB pagetables:1784kB unstable:0kB bounce:0kB free_pcp:5140kB local_pcp:112kB free_cma:0kB writeback_tmp:0kB pages_scanned:2997 all_unreclaimable? yes
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: lowmem_reserve[]: 0 0 504930 504930
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Node 0 Normal free:2084964kB min:2085592kB low:2606988kB high:3128388kB active_anon:506956568kB inactive_anon:424kB active_file:24664kB inactive_file:21900kB unevictable:0kB isolated(anon):0kB isolated(file):892kB present:525336576kB managed:517051592kB mlocked:0kB dirty:0kB writeback:0kB mapped:26072kB shmem:688kB slab_reclaimable:75820kB slab_unreclaimable:203396kB kernel_stack:45168kB pagetables:997800kB unstable:0kB bounce:0kB free_pcp:10532kB local_pcp:212kB free_cma:0kB writeback_tmp:0kB pages_scanned:69711 all_unreclaimable? no
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: lowmem_reserve[]: 0 0 0 0
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Node 0 DMA: 14kB (U) 08kB 016kB 132kB (U) 264kB (U) 1128kB (U) 1256kB (U) 0512kB 11024kB (U) 12048kB (M) 34096kB (M) = 15908kB
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Node 0 DMA32: 631
4kB (UE) 5998kB (UE) 50416kB (UE) 36632kB (UE) 18764kB (UEM) 72128kB (UEM) 14256kB (UEM) 2512kB (UE) 11024kB (U) 02048kB 4824096kB (UM) = 2028180kB
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Node 0 Normal: 28614kB (UEM) 13118kB (UE) 103716kB (UE) 130232kB (E) 95064kB (EM) 852128kB (EM) 2980256kB (UEM) 1349512kB (UEM) 3731024kB (EM) 02048kB 0*4096kB = 2085564kB
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: 13473 total pagecache pages
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: 0 pages in swap cache
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Swap cache stats: add 0, delete 0, find 0/0
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Free swap = 0kB
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Total swap = 0kB
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: 132116045 pages RAM
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: 0 pages HighMem/MovableOnly
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: 2136811 pages reserved
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 802] 0 802 12093 121 28 0 0 systemd-journal
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 832] 0 832 11351 164 23 0 -1000 systemd-udevd
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1169] 0 1169 13883 612 28 0 -1000 auditd
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1219] 999 1219 153085 2139 62 0 0 polkitd
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1223] 32 1223 17314 135 37 0 0 rpcbind
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1227] 81 1227 14530 179 33 0 -900 dbus-daemon
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1258] 0 1258 6596 100 19 0 0 systemd-logind
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1271] 0 1271 48802 118 35 0 0 gssproxy
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1591] 0 1591 25753 515 48 0 0 dhclient
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1654] 0 1654 143572 2838 96 0 0 tuned
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1814] 0 1814 52300 3025 65 0 0 argusagent
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1856] 0 1856 22452 285 42 0 0 master
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1860] 89 1860 22495 270 44 0 0 qmgr
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1869] 0 1869 596003 4938 126 0 0 argusagent
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1901] 0 1901 58729 201 46 0 0 rsyslogd
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1902] 0 1902 28251 285 58 0 -1000 sshd
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1904] 0 1904 31598 181 17 0 0 crond
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1907] 0 1907 6477 52 18 0 0 atd
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1912] 0 1912 27552 50 11 0 0 agetty
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 1913] 0 1913 27552 50 9 0 0 agetty
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 2103] 0 2103 172201 2367 19 0 0 aliyun-service
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [ 2241] 0 2241 4852 252 12 0 0 assist_daemon
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [12890] 38 12890 6433 150 17 0 0 ntpd
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [14439] 0 14439 256267 5324 48 0 0 ocp_agentd
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [14456] 0 14456 274285 9229 60 0 0 ocp_mgragent
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [15987] 1000 15987 127256808 126379346 247592 0 0 observer
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [32873] 0 32873 12973 383 20 0 0 AliYunDunUpdate
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [32971] 0 32971 28860 519 52 0 0 AliYunDun
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [32982] 0 32982 47580 7851 88 0 0 AliYunDunMonito
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [18095] 0 18095 846411 486369 1071 0 0 ocp_monagent
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: [20847] 89 20847 22478 271 43 0 0 pickup
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Out of memory: Kill process 15987 (observer) score 974 or sacrifice child
Sep 14 01:13:42 iZbp1j5usxm0z3abrvriquZ kernel: Killed process 15987 (observer), UID 1000, total-vm:509027232kB, anon-rss:505517384kB, file-rss:0kB, shmem-rss:0kB
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Stopping Aliyun Assist…
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Stopped Aliyun Assist.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Reloading.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Configuration file /etc/systemd/system/cloudmonitor.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Reloading.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Configuration file /etc/systemd/system/cloudmonitor.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Reloading.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Configuration file /etc/systemd/system/cloudmonitor.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Started Aliyun Assist.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Stopping AssistDaemon…
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Stopped AssistDaemon.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Reloading.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Configuration file /etc/systemd/system/cloudmonitor.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Reloading.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Configuration file /etc/systemd/system/cloudmonitor.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Reloading.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Configuration file /etc/systemd/system/cloudmonitor.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Reloading.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Configuration file /etc/systemd/system/cloudmonitor.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ systemd: Started AssistDaemon.
Sep 14 01:16:04 iZbp1j5usxm0z3abrvriquZ aliyun-service: Agent check point
Sep 14 01:16:05 iZbp1j5usxm0z3abrvriquZ aliyun-service: Started successfully

确实是内存问题 先后kill了ocp_monagent和observer

Sep 20 01:10:01 ep-ob-server03 systemd: Removed slice User Slice of root.
Sep 20 01:23:55 ep-ob-server03 aliyun-service: fatal error: runtime: cannot allocate memory

--内存分配失败

Sep 20 01:25:03 ep-ob-server03 aliyun-service: runtime stack:
Sep 20 01:27:06 ep-ob-server03 systemd: aliyun.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 20 01:28:21 ep-ob-server03 aliyun-service: runtime.throw({0x89fc35c, 0x1f})
Sep 20 01:29:19 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/panic.go:1047 +0x4d fp=0xa5d5c5c sp=0xa5d5c48 pc=0x807f73d
Sep 20 01:30:29 ep-ob-server03 aliyun-service: runtime.persistentalloc1(0x3ff8, 0x0, 0x91c9f78)
Sep 20 01:32:02 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/malloc.go:1440 +0x251 fp=0xa5d5c90 sp=0xa5d5c5c pc=0x8054a91
Sep 20 01:33:21 ep-ob-server03 aliyun-service: runtime.persistentalloc.func1()
Sep 20 01:33:45 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/malloc.go:1393 +0x35 fp=0xa5d5ca8 sp=0xa5d5c90 pc=0x8054825
Sep 20 01:35:19 ep-ob-server03 aliyun-service: runtime.persistentalloc(0x3ff8, 0x0, 0x91c9f78)
Sep 20 01:36:10 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/malloc.go:1392 +0x5e fp=0xa5d5cc8 sp=0xa5d5ca8 pc=0x80547ce
Sep 20 01:36:39 ep-ob-server03 aliyun-service: runtime.(*fixalloc).alloc(0x91bc930)
Sep 20 01:37:53 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/mfixalloc.go:92 +0x7d fp=0xa5d5ce4 sp=0xa5d5cc8 pc=0x805e98d
Sep 20 01:38:42 ep-ob-server03 aliyun-service: runtime.(*mheap).allocMSpanLocked(0x91b8400)
Sep 20 01:39:04 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/mheap.go:1116 +0xba fp=0xa5d5cf8 sp=0xa5d5ce4 pc=0x806fc7a
Sep 20 01:40:23 ep-ob-server03 aliyun-service: runtime.(*mheap).allocSpan(0x91b8400, 0x1, 0x0, 0x22)
Sep 20 01:41:19 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/mheap.go:1257 +0x241 fp=0xa5d5d80 sp=0xa5d5cf8 pc=0x806fef1
Sep 20 01:42:00 ep-ob-server03 aliyun-service: runtime.(*mheap).alloc.func1()
Sep 20 01:44:09 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/mheap.go:961 +0x82 fp=0xa5d5da4 sp=0xa5d5d80 pc=0x806f942
Sep 20 01:45:34 ep-ob-server03 aliyun-service: runtime.(*mheap).alloc(0x91b8400, 0x1, 0x22)
Sep 20 01:48:32 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/mheap.go:955 +0x5f fp=0xa5d5dc4 sp=0xa5d5da4 pc=0x806f89f
Sep 20 01:49:53 ep-ob-server03 aliyun-service: runtime.(*mcentral).grow(0x91b9630)
Sep 20 01:52:50 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/mcentral.go:246 +0x62 fp=0xa5d5de4 sp=0xa5d5dc4 pc=0x805cdb2
Sep 20 01:54:04 ep-ob-server03 aliyun-service: runtime.(*mcentral).cacheSpan(0x91b9630)
Sep 20 01:56:51 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/mcentral.go:166 +0xef fp=0xa5d5e20 sp=0xa5d5de4 pc=0x805c96f
Sep 20 01:58:36 ep-ob-server03 aliyun-service: runtime.(*mcache).refill(0xf7782200, 0x22)
Sep 20 02:00:37 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/mcache.go:182 +0x17c fp=0xa5d5e4c sp=0xa5d5e20 pc=0x805c22c
Sep 20 02:03:04 ep-ob-server03 aliyun-service: runtime.(*mcache).nextFree(0xf7782200, 0x22)
Sep 20 02:05:10 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/malloc.go:855 +0x81 fp=0xa5d5e6c sp=0xa5d5e4c pc=0x80539d1
Sep 20 02:07:02 ep-ob-server03 aliyun-service: runtime.mallocgc(0xf0, 0x89b74a0, 0x1)
Sep 20 02:08:22 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/malloc.go:1042 +0x38f fp=0xa5d5ea4 sp=0xa5d5e6c pc=0x8053eff
Sep 20 02:09:46 ep-ob-server03 aliyun-service: runtime.newobject(0x89b74a0)
Sep 20 02:11:12 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/malloc.go:1254 +0x2c fp=0xa5d5eb8 sp=0xa5d5ea4 pc=0x805444c
Sep 20 02:13:35 ep-ob-server03 aliyun-service: runtime.malg(0x8000)
Sep 20 02:16:23 ep-ob-server03 systemd: Unit aliyun.service entered failed state.
Sep 20 02:25:15 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/proc.go:4241 +0x27 fp=0xa5d5ed4 sp=0xa5d5eb8 pc=0x808adc7
Sep 20 02:27:45 ep-ob-server03 aliyun-service: runtime.mpreinit(0xa600900)
Sep 20 02:31:02 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/os_linux.go:396 +0x21 fp=0xa5d5ee0 sp=0xa5d5ed4 pc=0x807c4c1
Sep 20 02:33:33 ep-ob-server03 aliyun-service: runtime.mcommoninit(0xa600900, 0x22)
Sep 20 02:35:57 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/proc.go:840 +0x152 fp=0xa5d5f10 sp=0xa5d5ee0 pc=0x8083cd2
Sep 20 02:38:34 ep-ob-server03 aliyun-service: runtime.allocm(0xa052000, 0x8a41e7c, 0x22)
Sep 20 02:39:54 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/proc.go:1834 +0xe6 fp=0xa5d5f38 sp=0xa5d5f10 pc=0x80858b6
Sep 20 02:42:29 ep-ob-server03 aliyun-service: runtime.newm(0x8a41e7c, 0xa052000, 0x22)
Sep 20 02:43:52 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/proc.go:2197 +0x56 fp=0xa5d5f54 sp=0xa5d5f38 pc=0x80860f6
Sep 20 02:45:17 ep-ob-server03 aliyun-service: runtime.startm(0xa052000, 0x1, 0x0)
Sep 20 02:48:06 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/proc.go:2423 +0x1cf fp=0xa5d5f80 sp=0xa5d5f54 pc=0x808687f
Sep 20 02:50:28 ep-ob-server03 aliyun-service: runtime.wakep()
Sep 20 02:52:33 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/proc.go:2559 +0xca fp=0xa5d5fa0 sp=0xa5d5f80 pc=0x8086dba
Sep 20 02:54:06 ep-ob-server03 aliyun-service: runtime.resetspinning()
Sep 20 02:55:32 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/proc.go:3262 +0x5b fp=0xa5d5fb0 sp=0xa5d5fa0 pc=0x8088c7b
Sep 20 02:58:26 ep-ob-server03 aliyun-service: runtime.schedule()
Sep 20 03:02:05 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/proc.go:3384 +0xec fp=0xa5d5fcc sp=0xa5d5fb0 pc=0x808905c
Sep 20 03:04:27 ep-ob-server03 aliyun-service: runtime.mstart1()
Sep 20 03:06:12 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/proc.go:1506 +0xad fp=0xa5d5fdc sp=0xa5d5fcc pc=0x808501d
Sep 20 03:07:55 ep-ob-server03 aliyun-service: runtime.mstart0()
Sep 20 03:09:59 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/proc.go:1456 +0x5a fp=0xa5d5fec sp=0xa5d5fdc pc=0x8084f5a
Sep 20 03:11:46 ep-ob-server03 aliyun-service: runtime.mstart()
Sep 20 03:13:26 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/asm_386.s:271 +0x5 fp=0xa5d5ff0 sp=0xa5d5fec pc=0x80b0485
Sep 20 03:15:50 ep-ob-server03 aliyun-service: goroutine 1 [chan receive (scan), 8647 minutes]:
Sep 20 03:18:40 ep-ob-server03 aliyun-service: runtime.gopark(0x8a41dbc, 0xa29a030, 0xe, 0x17, 0x2)
Sep 20 03:19:37 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/proc.go:381 +0x108 fp=0xa2b9cb8 sp=0xa2b9ca4 pc=0x80828a8
Sep 20 03:22:07 ep-ob-server03 aliyun-service: runtime.chanrecv(0xa29a000, 0x0, 0x1)
Sep 20 03:24:36 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/chan.go:583 +0x3f4 fp=0xa2b9d00 sp=0xa2b9cb8 pc=0x804dc14
Sep 20 03:27:46 ep-ob-server03 aliyun-service: runtime.chanrecv1(0xa29a000, 0x0)
Sep 20 03:29:59 ep-ob-server03 aliyun-service: C:/Program Files/Go/src/runtime/chan.go:442 +0x1c fp=0xa2b9d14 sp=0xa2b9d00 pc=0x804d7ec
Sep 20 03:30:53 ep-ob-server03 aliyun-service: github.com/aliyun/aliyun_assist_client/thirdparty/service.(*systemd).Run.func1()
Sep 20 03:33:28 ep-ob-server03 systemd: aliyun.service failed.
Sep 20 03:39:20 ep-ob-server03 aliyun-service: D:/ciseagent/space/318129881/source/thirdparty/service/service_systemd_linux.go:245 +0xb3 fp=0xa2b9d3c sp=0xa2b9d14 pc=0x8774d93
Sep 20 03:40:54 ep-ob-server03 aliyun-service: github.com/aliyun/aliyun_assist_client/thirdparty/service.(*systemd).Run(0xa28e030)
Sep 20 03:42:46 ep-ob-server03 aliyun-service: D:/ciseagent/space/318129881/source/thirdparty/service/service_systemd_linux.go:246 +0xa9 fp=0xa2b9d58 sp=0xa2b9d3c pc=0x8770fa9
Sep 20 03:44:22 ep-ob-server03 aliyun-service: main.runRootCommand(0xa1e5120, {0x91c4cf8, 0x0, 0x0})
Sep 20 03:46:14 ep-ob-server03 aliyun-service: D:/ciseagent/space/318129881/source/rootcmd.go:625 +0x5b6 fp=0xa2b9eb0 sp=0xa2b9d58 pc=0x8859026
Sep 20 03:47:09 ep-ob-server03 aliyun-service: github.com/aliyun/aliyun_assist_client/thirdparty/aliyun-cli/cli.(*Command).executeInner(0x91a7de0, 0xa1e5120, {0xa11e120, 0x0, 0x0})
Sep 20 03:48:09 ep-ob-server03 aliyun-service: D:/ciseagent/space/318129881/source/thirdparty/aliyun-cli/cli/command.go:246 +0x57d fp=0xa2b9f64 sp=0xa2b9eb0 pc=0x83daf5d