I created a partition projection in Athena named 'dt', which is a STRING and contains date information in the format 2020/12/11/20
.
I'm running the following query in Athena
SELECT
DATE_FORMAT(dt, '%Y-%m') as dt,
count(*) as "total_visualization",
count(*)/cast(date_format(DATE '{END_DATE}', '%d') as integer) as "average_dia"
FROM
user.dashborad
WHERE
event = 'complete'
AND dt BETWEEN DATE '{START_DATE}' and DATE '{END_DATE}'
GROUP BY 1;
The resulting raw query received by Athena is:
DATE_FORMAT(dt, '%Y-%m') as dt,
count(*) as "total_visualization",
count(*)/cast(date_format(DATE '2022-08-08', '%d') as integer) as "average_day"
FROM user.dashborad
WHERE event = 'complete' AND dt BETWEEN DATE '2022-08-01' and DATE '2022-08-08'
GROUP BY 1;
However, I get the following error:
Error querying the database: SYNTAX_ERROR: line 2:62: Cannot check if varchar is BETWEEN date and date.
I've tried to find a workaround in an attempt to convert it into a date format using date_parse but it didn't work. And with str_to_date
I get this error:
SYNTAX_ERROR: line 2:2: Function str_to_date not registered
Is there any other way I can modify the query to convert 'dt' from a varchar into a format Athena understands?
CodePudding user response:
It is always a bad idea to store a date in a string instead of using the appropriate data type. You even call the column dt which suggests a datetime. This makes it harder to spot inappropriate handling.
Here
AND dt BETWEEN DATE '{START_DATE}' and DATE '{END_DATE}'
you compare a string with dates. Thus you rely on the DBMS guessing the string's date format correctly. Don't do this. Convert the string explicitely to a date, because you know the format. Or, as 'YYYY-MM-DD' is comparable, work with the strings right away:
AND dt BETWEEN '{START_DATE}' and '{END_DATE}'
Here
DATE_FORMAT(dt, '%Y-%m')
you invoke a date function on a string. This means the DBMS must again guess your format, convert your string into a date accordingly and then invoke the function to convert the date into a string. Instead, just use the appropriate string function on the string:
SUBSTR(dt, 1, 7)
The complete query:
SELECT
SUBSTR(dt, 1, 7) AS year_month,
COUNT(*) AS total_visualization,
COUNT(*) / CAST(SUBSTR('{END_DATE}', 9, 2)) AS INTEGER) AS average_dia
FROM
user.dashborad
WHERE
event = 'complete'
AND dt BETWEEN '{START_DATE}' and '{END_DATE}'
GROUP BY SUBSTR(dt, 1, 7)
ORDER BY SUBSTR(dt, 1, 7);