Home > Enterprise >  Cannot check if varchar is BETWEEN date and date
Cannot check if varchar is BETWEEN date and date

Time:08-10

I created a partition projection in Athena named 'dt', which is a STRING and contains date information in the format 2020/12/11/20.

I'm running the following query in Athena

SELECT
    DATE_FORMAT(dt, '%Y-%m') as dt,
    count(*) as "total_visualization",
    count(*)/cast(date_format(DATE '{END_DATE}', '%d') as integer) as "average_dia"
FROM 
    user.dashborad
WHERE 
    event = 'complete' 
    AND dt BETWEEN DATE '{START_DATE}' and DATE '{END_DATE}'
GROUP BY 1;

The resulting raw query received by Athena is:

    DATE_FORMAT(dt, '%Y-%m') as dt,
    count(*) as "total_visualization",
    count(*)/cast(date_format(DATE '2022-08-08', '%d') as integer) as "average_day"
 FROM user.dashborad
 WHERE event = 'complete' AND dt BETWEEN DATE '2022-08-01' and DATE '2022-08-08'
 GROUP BY 1;

However, I get the following error:

Error querying the database: SYNTAX_ERROR: line 2:62: Cannot check if varchar is BETWEEN date and date.

I've tried to find a workaround in an attempt to convert it into a date format using date_parse but it didn't work. And with str_to_date I get this error:

SYNTAX_ERROR: line 2:2: Function str_to_date not registered

Is there any other way I can modify the query to convert 'dt' from a varchar into a format Athena understands?

CodePudding user response:

It is always a bad idea to store a date in a string instead of using the appropriate data type. You even call the column dt which suggests a datetime. This makes it harder to spot inappropriate handling.

Here

AND dt BETWEEN DATE '{START_DATE}' and DATE '{END_DATE}'

you compare a string with dates. Thus you rely on the DBMS guessing the string's date format correctly. Don't do this. Convert the string explicitely to a date, because you know the format. Or, as 'YYYY-MM-DD' is comparable, work with the strings right away:

AND dt BETWEEN '{START_DATE}' and '{END_DATE}'

Here

DATE_FORMAT(dt, '%Y-%m')

you invoke a date function on a string. This means the DBMS must again guess your format, convert your string into a date accordingly and then invoke the function to convert the date into a string. Instead, just use the appropriate string function on the string:

SUBSTR(dt, 1, 7)

The complete query:

SELECT
    SUBSTR(dt, 1, 7) AS year_month,
    COUNT(*) AS total_visualization,
    COUNT(*) / CAST(SUBSTR('{END_DATE}', 9, 2)) AS INTEGER) AS average_dia
FROM 
    user.dashborad
WHERE 
    event = 'complete' 
    AND dt BETWEEN '{START_DATE}' and '{END_DATE}'
GROUP BY SUBSTR(dt, 1, 7)
ORDER BY SUBSTR(dt, 1, 7);
  • Related