Home > Software design >  Using Max() in where clause in Presto SQL
Using Max() in where clause in Presto SQL

Time:05-17

I have the below table.

ID Desc progress updated_time
1 abcd planned 2022-04-20 10:00AM
1 abcd planned 2022-04-25 12:00AM
1 abcd in progress 2022-04-26 4:00PM
1 abcd in progress 2022-05-04 11:00AM
1 abcd in progress 2022-05-06 12:00PM

I just want to return a row that has the latest updated_time regardless of what progress it is in, which is,

ID Desc progress updated_time
1 abcd in progress 2022-05-06 12:00PM

I know if I group by 'progress' (as shown below), I will get one for planned too which I do not need. I just need a single row for each ID with its latest updated time.

I wrote the following query,

select ID,desc,progress,updated_time 
from t1 
where updated_time IN (select ID, desc, progress, max(updated_time) 
from t1 group by 1,2,3)

I get the following error too, 'Multiple columns returned by subquery are not yet supported'

CodePudding user response:

Selecting multiple values in subquery won't work, you need to select single value with scalar subquery:

-- sample data
WITH dataset (ID, Desc, progress, updated_time) AS (
    VALUES 
(1, 'abcd', 'planned',  timestamp '2022-04-20 10:00'),
(1, 'abcd', 'planned',  timestamp '2022-04-25 12:00'),
(1, 'abcd', 'in progress',  timestamp '2022-04-26 16:00'),
(1, 'abcd', 'in progress',  timestamp '2022-05-04 11:00'),
(1, 'abcd', 'in progress',  timestamp '2022-05-06 12:00'),
(1, 'abcd', 'in progress',  timestamp '2022-05-07 12:00'),
(2, 'abcd', 'in progress',  timestamp '2022-05-04 11:00'),
(2, 'abcd', 'in progress',  timestamp '2022-05-06 12:00')
) 

--query
select  id, Desc, progress, updated_time
from dataset o
where updated_time = (select max(updated_time) from dataset i where i.id = o.id)

Or similar approach with max window function and subselect:

--query
select  id, Desc, progress, updated_time
from (
    select *,  max(updated_time) over (partition by id) max_time
    from dataset
)
where max_time = updated_time

Or just using row_number:

select  id, Desc, progress, updated_time
from 
(
    select *,  
        row_number() over(partition by id order by updated_time desc) rank
    from dataset
)
where rank  = 1

Output:

id Desc progress updated_time
1 abcd in progress 2022-05-07 12:00:00.000
2 abcd in progress 2022-05-06 12:00:00.000

CodePudding user response:

You are try for matching a single value with several columns and yhis raise an error ..

Looking to you code for you goal, instead o f an IN clasue based on a subquery you should use inner join

select ID,desc,progress,updated_time 
from t1 
INNER JOIN 
( select ID, desc, progress, max(updated_time) max_time 
from t1 group by 1,2,3) t on t.max_time = t1.updated_time

CodePudding user response:

I would probably use row_number or some other ranking function for this.

with t as (select a.*,
 row_number() over (partition by id order by updated_time desc as rn) 
select * from t where rn = 1
  • Related