I've read Partitioning data in Athena, however it is not clear how to make partitions for a table, when S3 has the following structure:
aws s3 ls s3://xxx-s3-zzz-datalake-prod/yyy/2022/09/
PRE 01/
PRE 02/
PRE 03/
PRE 04/
PRE 05/
PRE 06/
etc...
How could I create partition for such structure, is it possible? Or should I rename it to
aws s3 ls s3://xxx-s3-yyy-datalake-prod/zzz/2022/09/
PRE day=01/
PRE day=02/
PRE day=03/
etc...
and then add PARTITIONED BY (day int)
?
CodePudding user response:
The partitions can be added in both cases and you have to use different methods in these cases.If you have data in below format:
aws s3 ls s3://xxx-s3-zzz-datalake-prod/yyy/2022/09/
PRE 01/
PRE 02/
PRE 03/
PRE 04/
PRE 05/
PRE 06/
etc...
Then you can only add these partitions information to the table using below query below:
ALTER TABLE orders ADD
PARTITION (day = '01') LOCATION 's3://xxx-s3-zzz-datalake-prod/yyy/2022/09/01'
PARTITION (day = '02') LOCATION 's3://xxx-s3-zzz-datalake-prod/yyy/2022/09/02';
Refer to below link for more information.
https://docs.aws.amazon.com/athena/latest/ug/alter-table-add-partition.html
Also try to add more partitions to your table by adding year and month to PARTITIONED BY clause as only adding a day will not do any good.
In case of below structure it is straight forward and easy:
aws s3 ls s3://xxx-s3-yyy-datalake-prod/zzz/2022/09/
PRE day=01/
PRE day=02/
PRE day=03/
etc...
Here you can run MSCK REPAIR TABLE <table-name>
which will automatically populates table with partitions information as the structure is in Hive key-value supported format.The same information also can be added by Glue crawler.
Below link has more explanation for hive style and non hive style partitioning formats.
https://aws.amazon.com/premiumsupport/knowledge-center/athena-create-use-partitioned-tables/