I manage some data in AWS, and there are some parquet files in a S3 bucket. Everyday, new files will added to this bucket, and I would like to get the data in latest file by using Athena.
I want to know how to designate the latest file path in Athena Query. Is it possible to recognize the latest file from path of each parquet file?
CodePudding user response:
Presto DB (now Trino) is the engine on which Athena is based. Support for querying the file timestamp has been recently added, but it's likely to take a while before it's available on Athena (probably years).
In the meantime, if your parquet files include a timestamp in the name you could do something like:
select * from mydb
where "$path" in
(
select "$path"
from my db
order by "$path" desc
limit 1
)