Home > front end >  how to analyze data in aws athena from postgresql pg dump in s3?
how to analyze data in aws athena from postgresql pg dump in s3?

Time:08-02

I have taken pg dump of large tables (>50 GB) in AWS S3 in csv.gz format from postgresql RDS. I want to store this data into AWS athena in the form of actual tables as in the RDS, so that when requirement is there i can directly use the data without restoring those tables in the RDS.

Any suggestions will be much appreciated.

CodePudding user response:

If each file contains data from one table in CSV format, you can create a table to query that data.

I find the easiest way is to use AWS Glue:

  • Put each file relating to a table in a separate subdirectory in S3
  • Create a crawler in AWS Glue, pointing it to the directory
  • It will automatically create a table in Amazon Athena that lets you query the data

If you wish to improve performance of the queries, I would recommend that you convert the data into Snappy-compressed Parquet format.

From Examples of CTAS queries - Amazon Athena:

CREATE TABLE new_table
WITH (
      format = 'Parquet',
      write_compression = 'SNAPPY')
AS SELECT *
FROM old_table;
  • Related