obloader导入较慢

问题:一张42个字段,1300多万的csv格式数据,通过obloader导入需要3小时左右;
部署方式:社区版单机单节点部署;
机器、部署配置如下:
系统信息:
Linux version 4.19.90-23.8.v2101.ky10.x86_64 (KYLINSOFT@localhost.localdomain) (gcc version 7.3.0 (GCC)) #1 SMP Mon May 17 17:08:34 CST 2021
内存信息:
total used free shared buff/cache available
Mem: 31622880 24311780 512540 1574824 6798560 5208048
Swap: 2166780 2166780 0
磁盘空间:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 12K 16G 1% /dev/shm
tmpfs 16G 1.6G 14G 11% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/mapper/klas-root 146G 112G 35G 77% /
tmpfs 16G 288K 16G 1% /tmp
/dev/vda1 1014M 211M 804M 21% /boot
tmpfs 3.1G 0 3.1G 0% /run/user/0
CPU信息:8核
processor : 0
vendor_id : HygonGenuine
cpu family : 23
model : 1
model name : AMD EPYC Processor (with IBPB)
stepping : 2
microcode : 0x1000065
cpu MHz : 2199.998
cache size : 512 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl xtopology cpuid extd_apicid tsc_known_freq pni ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibpb vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 virt_ssbd arat
bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 4399.99
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management:

observer配置 single.yaml

only need to configure when remote login is required

user:
username: root
password: xxxxx

key_file: your ssh-key file path if need

port: your ssh port, default 22

timeout: ssh connection timeout (second), default 30

oceanbase-ce:
servers:
# Please don’t use hostname, only IP can be supported
- 127.0.0.1
global:
# The working directory for OceanBase Database. OceanBase Database is started under this directory. This is a required field.
home_path: /root/observer
# The directory for data storage. The default value is $home_path/store.
data_dir: /data
# The directory for clog, ilog, and slog. The default value is the same as the data_dir value.
redo_dir: /redo
# Please set devname as the network adaptor’s name whose ip is in the setting of severs.
# if set severs as “127.0.0.1”, please set devname as “lo”
# if current ip is 192.168.1.10, and the ip’s network adaptor’s name is “eth0”, please use “eth0”
devname: lo
# External port for OceanBase Database. The default value is 2881. DO NOT change this value after the cluster is started.
mysql_port: 2881
# Internal port for OceanBase Database. The default value is 2882. DO NOT change this value after the cluster is started.
rpc_port: 2882
# Defines the zone for an observer. The default value is zone1.
zone: zone1
# The maximum running memory for an observer. When ignored, autodeploy calculates this value based on the current server available resource.
memory_limit: 8G
# The percentage of the maximum available memory to the total memory. This value takes effect only when memory_limit is 0. The default value is 80.
memory_limit_percentage: 80
# The reserved system memory. system_memory is reserved for general tenants. The default value is 30G. Autodeploy calculates this value based on the current server available resource.
system_memory: 30G
# The size of a data file. When ignored, autodeploy calculates this value based on the current server available resource.
datafile_size: 30G
# The percentage of the data_dir space to the total disk space. This value takes effect only when datafile_size is 0. The default value is 90.
datafile_disk_percentage: 90
# System log level. The default value is INFO.
syslog_level: INFO
# Print system logs whose levels are higher than WARNING to a separate log file. The default value is true. The default value for autodeploy mode is false.
enable_syslog_wf: false
# Enable auto system log recycling or not. The default value is false. The default value for autodeploy mode is on.
enable_syslog_recycle: true
# The maximum number of reserved log files before enabling auto recycling. When set to 0, no logs are deleted. The default value for autodeploy mode is 4.
max_syslog_file_count: 4
# Cluster name for OceanBase Database. The default value is obcluster. When you deploy OceanBase Database and obproxy, this value must be the same as the cluster_name for obproxy.
appname: obcluster
# Password for root. The default value is empty.
root_password: xxxx
# Password for proxyro. proxyro_password must be the same as observer_sys_password. The default value is empty.
proxyro_password: xxxx

cluster配置 config.yaml

Only need to configure when remote login is required

user:
username: root
password: xxxxx

key_file: your ssh-key file path if need

port: 22

timeout: ssh connection timeout (second), default 30

oceanbase-ce:
servers:
# Please don’t use hostname, only IP can be supported

  • 127.0.0.1
    global:

    The working directory for OceanBase Database. OceanBase Database is started under this directory. This is a required field.

    home_path: /root/observer

    The directory for data storage. The default value is $home_path/store.

    data_dir: /data

    The directory for clog, ilog, and slog. The default value is the same as the data_dir value.

    redo_dir: /redo

    Please set devname as the network adaptor’s name whose ip is in the setting of severs.

    if set severs as “127.0.0.1”, please set devname as “lo”

    if current ip is 192.168.1.10, and the ip’s network adaptor’s name is “eth0”, please use “eth0”

    devname: lo
    mysql_port: 2881 # External port for OceanBase Database. The default value is 2881. DO NOT change this value after the cluster is started.
    rpc_port: 2882 # Internal port for OceanBase Database. The default value is 2882. DO NOT change this value after the cluster is started.
    zone: zone1
    cluster_id: 1

    please set memory limit to a suitable value which is matching resource.

    memory_limit: 8G # The maximum running memory for an observer
    system_memory: 4G # The reserved system memory. system_memory is reserved for general tenants. The default value is 30G.
    stack_size: 512K
    cpu_count: 16
    cache_wash_threshold: 1G
    __min_full_resource_pool_memory: 268435456
    workers_per_cpu_quota: 10
    schema_history_expire_time: 1d

    The value of net_thread_count had better be same as cpu’s core number.

    net_thread_count: 4
    major_freeze_duty_time: Disable
    minor_freeze_times: 10
    enable_separate_sys_clog: 0
    enable_merge_by_turn: false
    datafile_disk_percentage: 20 # The percentage of the data_dir space to the total disk space. This value takes effect only when datafile_size is 0. The default value is 90.
    syslog_level: INFO # System log level. The default value is INFO.
    enable_syslog_wf: false # Print system logs whose levels are higher than WARNING to a separate log file. The default value is true.
    enable_syslog_recycle: true # Enable auto system log recycling or not. The default value is false.
    max_syslog_file_count: 4 # The maximum number of reserved log files before enabling auto recycling. The default value is 0.
    root_password: xxxx

    root_password: # root user password, can be empty

auto_create_tenant: true

xms,xmx分别是多大?
thread线程数设置的呢?
看看手册,进行一下调优。

https://open.oceanbase.com/docs/obloaderdumper-cn/V3.0.0/0000000000031748

xms,xmx默认4G,thread并发数未指定;查了可用内存剩5g左右,我改成xms5G,xmx5G,加–thread=8,用24w数据测了下 跟以前差不多要一分半钟左右。

检查一下租户的规格,租户的 CPU 配额直接影响导入的性能,租户的内存足够大,避免触发导入限流,也可以间接地提升性能。注意:是租户的规格配置,不是机器的规格配置。有时候机器的规格很大,但是划分给租户却很小。“一张42个字段,1300多万的csv格式数据,通过obloader导入需要3小时左右;” —— 建议提升一下配置试试。