I'm sure this has been asked over and over, but I can't quite find a simple example that I can fully grok.
I'm trying to deduplicate (do a DISTINCT ON
) by one column and COUNT
records GROUPed By
columns that differ from the column used to deduplicate, but without introducing subqueries.
Let's say I have a table with the following information:
order_num | date | region | timestamp_updated |
---|---|---|---|
001 | 2021-09-01 | Murica | 2021-09-02T19:00:01Z |
001 | 2021-09-01 | Murica | 2021-09-03T19:00:01Z |
002 | 2021-09-01 | Yurop | 2021-09-02T19:00:01Z |
003 | 2021-09-01 | Yurop | 2021-09-03T19:00:01Z |
004 | 2021-09-02 | Yurop | 2021-09-03T19:00:01Z |
I would like to first get unique records with different order_num
(keeping the most recently updated) ones AND then count groups or orders by date
and region
.
De-duplicate (gets rid of the oldest
order_num='001A'
):order_num date region timestamp_updated 001 2021-09-01 Murica 2021-09-03T19:00:01Z 002 2021-09-01 Yurop 2021-09-02T19:00:01Z 003 2021-09-01 Yurop 2021-09-03T19:00:01Z 004 2021-09-02 Yurop 2021-09-03T19:00:01Z Then group by and count:
date region count 2021-09-01 Murica 1 2021-09-01 Yurop 2 2021-09-02 Yurop 1
I know how to do those two things separately (distinct on(order_num)
order by timestamp_updated desc
) to deduplicate and then select count(*)
group by date, region
) And even together with subqueries. But I'd like to try to avoid subqueries as much as possible and here's where window functions (seem) to come in handy and I don't know much anything about those.
The closest thing I've been able to get are groups, but they show one record by each order_num
. The records are correct, but they are duplicated:
select distinct on (order_num) date, region, count(1)over (
partition by order_num
)
from orders_table
order by order_num, timestamp_updated desc;
That query ^^ shows:
date | region | count | |
---|---|---|---|
2021-09-01 | Murica | 1 | I think this is the first 001 |
2021-09-01 | Murica | 1 | I think this is the second 001 |
2021-09-01 | Yurop | 2 | I think this is the first Yurop: 002 |
2021-09-01 | Yurop | 2 | I think this is the second Yurop: 003 |
2021-09-02 | Yurop | 1 |
CodePudding user response:
You could get the max timestamp_updatedper
per order_num, date, region
and then aggregate again to get the counts per date, region
using window function
select distinct
date,
region,
count(max(timestamp_updated)) over (partition by date, region) as counts
from t
group by order_num, date, region;