I have the below table.
ID | Desc | progress | updated_time |
---|---|---|---|
1 | abcd | planned | 2022-04-20 10:00AM |
1 | abcd | planned | 2022-04-25 12:00AM |
1 | abcd | in progress | 2022-04-26 4:00PM |
1 | abcd | in progress | 2022-05-04 11:00AM |
1 | abcd | in progress | 2022-05-06 12:00PM |
I just want to return a row that has the latest updated_time regardless of what progress it is in, which is,
ID | Desc | progress | updated_time |
---|---|---|---|
1 | abcd | in progress | 2022-05-06 12:00PM |
I know if I group by 'progress' (as shown below), I will get one for planned too which I do not need. I just need a single row for each ID with its latest updated time.
I wrote the following query,
select ID,desc,progress,updated_time
from t1
where updated_time IN (select ID, desc, progress, max(updated_time)
from t1 group by 1,2,3)
I get the following error too, 'Multiple columns returned by subquery are not yet supported'
CodePudding user response:
Selecting multiple values in subquery won't work, you need to select single value with scalar subquery:
-- sample data
WITH dataset (ID, Desc, progress, updated_time) AS (
VALUES
(1, 'abcd', 'planned', timestamp '2022-04-20 10:00'),
(1, 'abcd', 'planned', timestamp '2022-04-25 12:00'),
(1, 'abcd', 'in progress', timestamp '2022-04-26 16:00'),
(1, 'abcd', 'in progress', timestamp '2022-05-04 11:00'),
(1, 'abcd', 'in progress', timestamp '2022-05-06 12:00'),
(1, 'abcd', 'in progress', timestamp '2022-05-07 12:00'),
(2, 'abcd', 'in progress', timestamp '2022-05-04 11:00'),
(2, 'abcd', 'in progress', timestamp '2022-05-06 12:00')
)
--query
select id, Desc, progress, updated_time
from dataset o
where updated_time = (select max(updated_time) from dataset i where i.id = o.id)
Or similar approach with max
window function and subselect:
--query
select id, Desc, progress, updated_time
from (
select *, max(updated_time) over (partition by id) max_time
from dataset
)
where max_time = updated_time
Or just using row_number
:
select id, Desc, progress, updated_time
from
(
select *,
row_number() over(partition by id order by updated_time desc) rank
from dataset
)
where rank = 1
Output:
id | Desc | progress | updated_time |
---|---|---|---|
1 | abcd | in progress | 2022-05-07 12:00:00.000 |
2 | abcd | in progress | 2022-05-06 12:00:00.000 |
CodePudding user response:
You are try for matching a single value with several columns and yhis raise an error ..
Looking to you code for you goal, instead o f an IN clasue based on a subquery you should use inner join
select ID,desc,progress,updated_time
from t1
INNER JOIN
( select ID, desc, progress, max(updated_time) max_time
from t1 group by 1,2,3) t on t.max_time = t1.updated_time
CodePudding user response:
I would probably use row_number or some other ranking function for this.
with t as (select a.*,
row_number() over (partition by id order by updated_time desc as rn)
select * from t where rn = 1