Using Max() in where clause in Presto SQL-CodePudding

I have the below table.

ID	Desc	progress	updated_time
1	abcd	planned	2022-04-20 10:00AM
1	abcd	planned	2022-04-25 12:00AM
1	abcd	in progress	2022-04-26 4:00PM
1	abcd	in progress	2022-05-04 11:00AM
1	abcd	in progress	2022-05-06 12:00PM

I just want to return a row that has the latest updated_time regardless of what progress it is in, which is,

ID	Desc	progress	updated_time
1	abcd	in progress	2022-05-06 12:00PM

I know if I group by 'progress' (as shown below), I will get one for planned too which I do not need. I just need a single row for each ID with its latest updated time.

I wrote the following query,

select ID,desc,progress,updated_time 
from t1 
where updated_time IN (select ID, desc, progress, max(updated_time) 
from t1 group by 1,2,3)

I get the following error too, 'Multiple columns returned by subquery are not yet supported'

CodePudding user response：

Selecting multiple values in subquery won't work, you need to select single value with scalar subquery:

-- sample data
WITH dataset (ID, Desc, progress, updated_time) AS (
    VALUES 
(1, 'abcd', 'planned',  timestamp '2022-04-20 10:00'),
(1, 'abcd', 'planned',  timestamp '2022-04-25 12:00'),
(1, 'abcd', 'in progress',  timestamp '2022-04-26 16:00'),
(1, 'abcd', 'in progress',  timestamp '2022-05-04 11:00'),
(1, 'abcd', 'in progress',  timestamp '2022-05-06 12:00'),
(1, 'abcd', 'in progress',  timestamp '2022-05-07 12:00'),
(2, 'abcd', 'in progress',  timestamp '2022-05-04 11:00'),
(2, 'abcd', 'in progress',  timestamp '2022-05-06 12:00')
) 

--query
select  id, Desc, progress, updated_time
from dataset o
where updated_time = (select max(updated_time) from dataset i where i.id = o.id)

Or similar approach with max window function and subselect:

--query
select  id, Desc, progress, updated_time
from (
    select *,  max(updated_time) over (partition by id) max_time
    from dataset
)
where max_time = updated_time

Or just using row_number:

select  id, Desc, progress, updated_time
from 
(
    select *,  
        row_number() over(partition by id order by updated_time desc) rank
    from dataset
)
where rank  = 1

Output:

id	Desc	progress	updated_time
1	abcd	in progress	2022-05-07 12:00:00.000
2	abcd	in progress	2022-05-06 12:00:00.000

CodePudding user response：

You are try for matching a single value with several columns and yhis raise an error ..

Looking to you code for you goal, instead o f an IN clasue based on a subquery you should use inner join

select ID,desc,progress,updated_time 
from t1 
INNER JOIN 
( select ID, desc, progress, max(updated_time) max_time 
from t1 group by 1,2,3) t on t.max_time = t1.updated_time

CodePudding user response：

I would probably use row_number or some other ranking function for this.

with t as (select a.*,
 row_number() over (partition by id order by updated_time desc as rn) 
select * from t where rn = 1