Home > Mobile >  Error with Presto --> strings and parsing dates
Error with Presto --> strings and parsing dates

Time:06-28

dear community!

I'm a bit lost with my SQL query on PRESTO. Essentially, I have a view with the following columns:

branch | year_month (varchar) | sellid [... and others]

Germany | 2022-06 | 123q1239

What I need to do is fairly simple, I need to create a table like this:

branch | year_month | count(sellid) actual month | count(sellid) last month

I attempted the following among other things, but unfortunately I'm still getting errors.

SELECT branch, year_month, count(sellid)as active_locs,
CASE
    WHEN TO_DATE(year_month, '%Y-%m') = date_format(TO_DATE(year_month, '%Y-%m') - interval '1' month, '%Y-%m') THEN count(sellid) END as locs_LM
FROM table
GROUP BY branch, year_month
  

This is the exact error I'm getting:

An error occurred when executing the SQL command: SELECT branch, year_month, count(sellid)as active_locs, CASE WHEN TO_DATE(year_month, '%Y-%m') = date_format(TO_DATE(year_month, '%Y-%m') - interv...

Query failed (#20220628_093246_00650_e4kcc): line 3:39: '=' cannot be applied to date, varchar [DB Errorcode=1] 1 statement failed.

Could someone help me out??

Thanks in advance

CodePudding user response:

date_format returns varchar while TO_DATE returns date, so you have type mismatch. Also you need to use date_parse with provided format:

SELECT branch, year_month, count(sellid)as active_locs,
CASE
    WHEN year_month = date_format(date_parse(year_month, '%Y-%m') - interval '1' month, '%Y-%m') THEN count(sellid) END as locs_LM
FROM dataset
GROUP BY branch, year_month

This will make query executing, but will not make it right. To make it right you can use windows functions and subquery:

--sample data
WITH dataset(branch, year_month, sellid) AS (
    VALUES ('Germany', '2022-06', '123q1239'),
        ('Germany', '2022-05', '123q1239')
)

-- sample query
SELECT branch,
    date_format(year_month, '%Y-%m') year_month,
    active_locs,
    if( -- if previous record is for previous month use previous active_locs
        lag(year_month) over (partition  by branch order by year_month) 
            = year_month - interval '1' month, 
        lag(active_locs) over (partition  by branch order by year_month),
        0
    ) locs_LM
FROM (
        SELECT branch,
            date_parse(year_month, '%Y-%m') year_month,
            count(sellid) as active_locs
        FROM dataset
        GROUP BY 1,
            2
    ) 

Output:

branch year_month active_locs locs_LM
Germany 2022-05 1 0
Germany 2022-06 1 1
  • Related