aws_s3.query_export_to_s3 PostgreSQL RDS extension exporting all multi-part CSV files to S3 with a h-CodePudding

I'm using the aws_s3.query_export_to_s3 function to export data from an Amazon Aurora Postgresql database to S3 in CSV format with a header row.

This works.

However, when the export is large and outputs to multiple part files, the first part file has the CSV header row, and subsequent part files do not.

SELECT * FROM aws_s3.query_export_to_s3(
  'SELECT ...',
  aws_commons.create_s3_uri(...),
  options:='format csv, HEADER true'
);

How can I make this export add the header row to all CSV file parts?

I'm using Apache Spark to load this CSV data and it expects a header row in each individual part file.

CodePudding user response：

How can I make this export add the header row to all part filess?

It's not possible, unfortunately.

The aws_s3.query_export_to_s3 function uses the PostgreSQL COPY command under the hood & then chunks the files appropriately depending on size.

Unless the extension picks up on the HEADER true option, caches the header & then provides an option to apply that to every CSV file generated, you're out of luck.

The expectation is that the files are then combined at destination when downloaded or the file processor has some mechanism of reading files in parts or the file processor only needs the header once.