Home > Software design >  AWS Athena Partitioning
AWS Athena Partitioning

Time:09-27

I've read Partitioning data in Athena, however it is not clear how to make partitions for a table, when S3 has the following structure:

aws s3 ls s3://xxx-s3-zzz-datalake-prod/yyy/2022/09/
                       PRE 01/
                       PRE 02/
                       PRE 03/
                       PRE 04/
                       PRE 05/
                       PRE 06/
                      etc...

How could I create partition for such structure, is it possible? Or should I rename it to

aws s3 ls s3://xxx-s3-yyy-datalake-prod/zzz/2022/09/
                       PRE day=01/
                       PRE day=02/
                       PRE day=03/
                       etc...

and then add PARTITIONED BY (day int) ?

CodePudding user response:

The partitions can be added in both cases and you have to use different methods in these cases.If you have data in below format:

aws s3 ls s3://xxx-s3-zzz-datalake-prod/yyy/2022/09/
                       PRE 01/
                       PRE 02/
                       PRE 03/
                       PRE 04/
                       PRE 05/
                       PRE 06/
                      etc...

Then you can only add these partitions information to the table using below query below:

ALTER TABLE orders ADD
  PARTITION (day = '01') LOCATION 's3://xxx-s3-zzz-datalake-prod/yyy/2022/09/01'
  PARTITION (day = '02') LOCATION 's3://xxx-s3-zzz-datalake-prod/yyy/2022/09/02';

Refer to below link for more information.

https://docs.aws.amazon.com/athena/latest/ug/alter-table-add-partition.html

Also try to add more partitions to your table by adding year and month to PARTITIONED BY clause as only adding a day will not do any good.

In case of below structure it is straight forward and easy:

aws s3 ls s3://xxx-s3-yyy-datalake-prod/zzz/2022/09/
                       PRE day=01/
                       PRE day=02/
                       PRE day=03/
                       etc...

Here you can run MSCK REPAIR TABLE <table-name> which will automatically populates table with partitions information as the structure is in Hive key-value supported format.The same information also can be added by Glue crawler.

Below link has more explanation for hive style and non hive style partitioning formats.

https://aws.amazon.com/premiumsupport/knowledge-center/athena-create-use-partitioned-tables/

  • Related