I've tried all afternoon to dedup a table that looks like this:
ID1 | ID2 | Date | Time |Status | Price
---- ----- ------------ ----------------- -------- -------
01 | A | 01/01/2022 | 10:41:47.000000 | DDD | 55
01 | B | 02/01/2022 | 16:22:31.000000 | DDD | 53
02 | C | 01/01/2022 | 08:54:03.000000 | AAA | 72
02 | D | 03/01/2022 | 11:12:35.000000 | DDD |
03 | E | 01/01/2022 | 17:15:41.000000 | DDD | 67
03 | F | 01/01/2022 | 19:27:22.000000 | DDD | 69
03 | G | 02/01/2022 | 06:45:52.000000 | DDD | 78
Basically, I need to dedup based on two conditions:
Status
: where AAA > BBB > CCC > DDD. So, pick the highest one.- When the
Status
is the same given the sameID1
, pick the latest one based onDate
andTime
.
The final table should look like:
ID1 | ID2 | Date | Time |Status | Price
---- ----- ------------ ----------------- -------- -------
01 | B | 02/01/2022 | 16:22:31.000000 | DDD | 53
02 | C | 01/01/2022 | 08:54:03.000000 | AAA | 72
03 | G | 02/01/2022 | 06:45:52.000000 | DDD | 78
Is there a way to do this in Redshift SQL / PostgreSQL?
I tried variations of this, but everytime it doesn't work because it demands that I add all columns to the group by, so then it defeats the purpose
select a.id1,
b.id2,
b.date,
b.time,
b.status,
b.price,
case when (status = 'AAA') then 4
when (status = 'BBB') then 3
when (status= 'CCC') then 2
when (status = 'DDD') then 1
when (status = 'EEE') then 0
else null end as row_order
from table1 a
left join table2 b
on a.id1=b.id1
group by id1
having row_order = max(row_order)
and date=max(date)
and time=max(time)
Any help at all is appreciated!
CodePudding user response:
Windowing functions are good at this:
SELECT ID1, ID2, Date, Time, Status, Price
FROM (
SELECT *,
row_number() OVER (PARTITION BY ID1 ORDER BY Status, Date DESC, Time DESC) rn
FROM MyTable
) t
WHERE rn = 1
See it work here:
CodePudding user response:
You can use ROW_NUMBER()
like so:
with cte as (
select a.id1,
b.id2,
b.date,
b.time,
b.status,
b.price,
ROW_NUMBER() OVER (PARTITION BY a.id1 ORDER BY b.status ASC, b.date DESC, b.time DESC) RN
from table1 a
left join table2 b on a.id1=b.id1
)
select * from cte where rn = 1
CodePudding user response:
This is a typical top-1-per-group problem. The canonical solution indeed involves window functions, as demonstrated by Joel Coehoorn and Aaron Dietz.
But Postgres has a specific extension, called distinct on
, which is built exactly for the purpose of solving top-1-per-group problems. The syntax is neater, and you benefit built-in optimizations:
select distinct on (id1) t.*
from mytable t
order by id1, status, "Date" desc, "Time" desc
Here is a demo on DB Fiddle based on that of Joel Coehoorn.