I have some data rows in AWS Athena table and I am trying to get the data from the last 1 hour. I am using awswrangler, I will post my snippet below. Basically, instead of querying all data and then filtering out only the last 1 hour with Python, I would like to do that in the Athena SQL query so that I get a faster response (and thus execution time of the program). My code is:
import awswrangler as wr
import boto3
session=boto3.Session()
df = wr.athena.read_sql_query(f"""SELECT *
FROM data_table""",
database="database",
keep_files = False,
boto3_session = session).sort_values('timestamp')
My progress:
I can get the current timestamp with "SELECT CURRENT_TIMESTAMP"
, but this will return the timestamp in a date format. In order to get the last 1 hour, my idea is to convert 1 hour to milliseconds as well, and subtract it from the milliseconds of current timestamp and apply it as a filter.
NOTE! timestamp
in the table is in milliseconds.
CodePudding user response:
An option would be to convert the timestamp and then filter with subtracting an hour from the current time.
Assuming the value in timestamp is milliseconds since epoch you can use from_unixtime:
Based on your sample value provided to see how that works:
select from_unixtime(1650578683860/1000e0)
Which then gives the result:
2022-04-21 22:04:43.860
Then you can use DATE_ADD and subtract an hour from CURRENT_STAMP, so the where clause would be something like:
WHERE from_unixtime("timestamp"/1000e0) >= DATE_ADD('hour', -1, CURRENT_TIMESTAMP)