如何准确导出每个分区中的数据

【 使用环境 】生产环境 or 测试环境
【 OB or 其他组件 】ob及obdumper
【 使用版本 】ob4.1社区版 obdumper4.2.1
【问题描述】清晰明确描述问题
我在一个3节点集群中存储了一个数据集,大概200w+条数据,在一列上使用key分区:
ALTER TABLE power PARTITION BY key(Global_active_power) PARTITIONS 3;
我打算导出每个分区中的数据,尝试了obdumper的命令导出3个csv文件,但似乎存在问题。我该如何准确导出每个分区中的数据?
【复现路径】问题出现前后相关操作
【问题现象及影响】
导出数据后,确实是3个csv文件,但这每一个文件都有200w+的数据,但我期望的结果应该是这3个文件加一起才是200w+条数据。
下面是obdumper调用的日志:
(venv) user1@pai-master:bin$ ./obdumper -h 202.199.6.84 -P 2881 -u user1 -p linux123 -t sys -D power --csv --table '’ --partition p0,p1,p2 -f ~/oblab/obtest/power-3-partitionn/
2023-06-29 23:48:10 [INFO] Parsed args:
[–csv] true
[–file-path] /home/hdd/user1/oblab/obtest/power-3-partitionn/
[–host] 202.199.6.84
[–port] 2881
[–user] user1
[–tenant] sys
[–password] ******
[–database] power
[–sys-user] root
[–table] [
]
[–partition] [p0, p1, p2]

2023-06-29 23:48:10 [INFO] Load jdbc driver class: “com.oceanbase.jdbc.Driver” finished
2023-06-29 23:48:11 [INFO] Manifest: “/home/hdd/user1/oblab/obtest/power-3-partitionn/data/MANIFEST.bin” has been saved
2023-06-29 23:48:11 [INFO] Query partition names for table: “power” success. Partitions: 3
2023-06-29 23:48:13 [INFO] Try to query unique keys for table: “power”… (But no-primary constraints).
2023-06-29 23:48:16 [INFO] Query unique keys for table: “power” success. (No-unique constraints). Elapsed: 3.338 s
2023-06-29 23:48:16 [INFO] Query table entry for table: “power” success. Remain: 0. Elapsed: 9.839 ms
2023-06-29 23:48:16 [INFO] Query all table entries success. Total: 1. Elapsed: 4.724 s
2023-06-29 23:48:16 [INFO] Split rows for partitioned table(without primary key): “power” partition: “p1” success. Ranges: 1
2023-06-29 23:48:16 [INFO] Split rows for partitioned table(without primary key): “power” partition: “p2” success. Ranges: 1
2023-06-29 23:48:16 [INFO] Split rows for partitioned table(without primary key): “power” partition: “p0” success. Ranges: 1
2023-06-29 23:48:16 [INFO] Generate 3 dump tasks finished. Total Elapsed: 16.32 ms
2023-06-29 23:48:16 [INFO] Start 64 record dump threads for 3 dump tasks finished
2023-06-29 23:48:21 [INFO]

Dump Performance Monitor:

Dimension \ Metric | Tps | Throughput | Buffer

 1.sec.avg      |    202265.49 Records/sec     |         11.29 MB/sec         |      1 Slots       
 1.min.avg      |     202222.4 Records/min     |         11.29 MB/min         |      1 Slots       
   Total        |       1014009 Records        |           56.59 MB           |      1 Slots       

2023-06-29 23:48:26 [INFO]

Dump Performance Monitor:

Dimension \ Metric | Tps | Throughput | Buffer

 1.sec.avg      |    209467.27 Records/sec     |         11.69 MB/sec         |      1 Slots       
 1.min.avg      |    203380.56 Records/min     |         11.35 MB/min         |      1 Slots       
   Total        |       2095532 Records        |          116.99 MB           |      1 Slots       

2023-06-29 23:48:31 [INFO]

Dump Performance Monitor:

Dimension \ Metric | Tps | Throughput | Buffer

 1.sec.avg      |    213367.74 Records/sec     |         11.91 MB/sec         |      1 Slots       
 1.min.avg      |    204802.25 Records/min     |         11.43 MB/min         |      1 Slots       
   Total        |       3201558 Records        |          178.75 MB           |      1 Slots       

2023-06-29 23:48:36 [INFO]

Dump Performance Monitor:

Dimension \ Metric | Tps | Throughput | Buffer

 1.sec.avg      |    215296.91 Records/sec     |         12.02 MB/sec         |      1 Slots       
 1.min.avg      |    206105.09 Records/min     |         11.5 MB/min          |      1 Slots       
   Total        |       4306970 Records        |          240.46 MB           |      1 Slots       

2023-06-29 23:48:41 [INFO]

Dump Performance Monitor:

Dimension \ Metric | Tps | Throughput | Buffer

 1.sec.avg      |    216377.66 Records/sec     |         12.08 MB/sec         |      1 Slots       
 1.min.avg      |    207272.12 Records/min     |         11.57 MB/min         |      1 Slots       
   Total        |       5410471 Records        |          302.08 MB           |      1 Slots       

2023-06-29 23:48:45 [INFO] No Server Status
2023-06-29 23:48:45 [INFO] Dump 2499874 rows power.power to “/home/hdd/user1/oblab/obtest/power-3-partitionn/data/power/TABLE/power.3.*.csv” finished
2023-06-29 23:48:46 [INFO]

Dump Performance Monitor:

Dimension \ Metric | Tps | Throughput | Buffer

 1.sec.avg      |    217145.15 Records/sec     |         12.12 MB/sec         |      1 Slots       
 1.min.avg      |    208368.76 Records/min     |         11.63 MB/min         |      1 Slots       
   Total        |       6515400 Records        |          363.77 MB           |      1 Slots       

2023-06-29 23:48:47 [INFO] Dump 2206910 rows power.power to “/home/hdd/user1/oblab/obtest/power-3-partitionn/data/power/TABLE/power.2..csv" finished
2023-06-29 23:48:49 [INFO] Dump 2190796 rows power.power to "/home/hdd/user1/oblab/obtest/power-3-partitionn/data/power/TABLE/power.1.
.csv” finished
2023-06-29 23:48:50 [INFO] Close connection count: 7 of the DataSource. Key: BIZ
2023-06-29 23:48:50 [INFO] Shutdown task context finished
2023-06-29 23:48:50 [INFO] ---------- Finished Tasks: 3 Running Tasks: 0 Progress: 100.00% ----------
2023-06-29 23:48:50 [INFO]

All Dump Tasks Finished:


    No.#        |        Type        |             Name             |            Count             |       Status       

     1          |       TABLE        |            power             |           6897580            |      SUCCESS       

Total Count: 6897580 End Time: 2023-06-29 23:48:50

2023-06-29 23:48:50 [INFO] Dump record finished. Total Elapsed: 38.28 s
2023-06-29 23:48:50 [INFO] System exit 0
【附件】

create view as select * from table partition(分区名),建个视图 然后导出视图里的数据不知道行不行,可以试一下

我测试版本observer 3.2.3.3
通过导出没有问题, ./obdumper -h 202.199.6.84 -P 2881 -u user1 -p linux123 -t sys -D power --csv --table '’ --partition p0,p1,p2 -f ~/oblab/obtest/power-3-partitionn/ 语句CSV 改成SQL试一试

加一下这个参数就可以

–partition ’ partition_name [, partition_name…] ’

用于标识导出指定分区的数据。该选项的参数值是分区名,多个分区名用逗号分隔。该选项仅与任意一种数据格式选项搭配使用,不可搭配 --query-sql 选项使用。

注意
指定分区导出时,对于二级分区表需要指定二级分区名,无法按照一级分区名导出。如果是模版二级分区,对应的分区名为:一级分区名+s+二级分区名。

https://www.oceanbase.com/docs/community-obloaderdumper-cn-10000000002043206