why Iceberg rewriteDataFiles doesn't rewrite the files to one file?-CodePudding

I have an iceberg table with 2 parquets files store 4 rows in s3 I tried the following command:

val tables = new HadoopTables(conf);
val table = tables.load("s3://iceberg-tests-storage/data/db/test5");    
SparkActions.get(spark).rewriteDataFiles(table).option("target-file-size-bytes", "52428800").execute();

but nothing changed. what I'm doing wrong?

CodePudding user response：

A few notes:

Iceberg by default won't compact files unless a minimum number of small files are available to compact per file group and per partition. The default is 5.
- This can be configured via min-input-files as an option.
Iceberg won't compact files across partitions, as one file must map 1:1 to a tuple of partition values.
- As an example: for a table partitioned by col1 and col2, files with col1=A and col2=1 cannot be compacted with files with col1=A and col2=4

In your case, if you set min-input-files to 2, provided the files are part of the same partition or the table is not partitioned, the files should be compacted together.