I have an iceberg table with 2 parquets files store 4 rows in s3 I tried the following command:
val tables = new HadoopTables(conf);
val table = tables.load("s3://iceberg-tests-storage/data/db/test5");
SparkActions.get(spark).rewriteDataFiles(table).option("target-file-size-bytes", "52428800").execute();
but nothing changed. what I'm doing wrong?
CodePudding user response:
A few notes:
- Iceberg by default won't compact files unless a minimum number of small files are available to compact per file group and per partition. The default is 5.
- This can be configured via
min-input-files
as an option.
- This can be configured via
- Iceberg won't compact files across partitions, as one file must map 1:1 to a tuple of partition values.
- As an example: for a table partitioned by col1 and col2, files with col1=A and col2=1 cannot be compacted with files with col1=A and col2=4
In your case, if you set min-input-files
to 2, provided the files are part of the same partition or the table is not partitioned, the files should be compacted together.