Home > Blockchain >  Count unique rows GROUP(ed) BY different columns than used in DISTINCT ON
Count unique rows GROUP(ed) BY different columns than used in DISTINCT ON

Time:09-28

I'm sure this has been asked over and over, but I can't quite find a simple example that I can fully grok.

I'm trying to deduplicate (do a DISTINCT ON) by one column and COUNT records GROUPed By columns that differ from the column used to deduplicate, but without introducing subqueries.

Let's say I have a table with the following information:

order_num date region timestamp_updated
001 2021-09-01 Murica 2021-09-02T19:00:01Z
001 2021-09-01 Murica 2021-09-03T19:00:01Z
002 2021-09-01 Yurop 2021-09-02T19:00:01Z
003 2021-09-01 Yurop 2021-09-03T19:00:01Z
004 2021-09-02 Yurop 2021-09-03T19:00:01Z

I would like to first get unique records with different order_num (keeping the most recently updated) ones AND then count groups or orders by date and region.

  • De-duplicate (gets rid of the oldest order_num='001A'):

    order_num date region timestamp_updated
    001 2021-09-01 Murica 2021-09-03T19:00:01Z
    002 2021-09-01 Yurop 2021-09-02T19:00:01Z
    003 2021-09-01 Yurop 2021-09-03T19:00:01Z
    004 2021-09-02 Yurop 2021-09-03T19:00:01Z
  • Then group by and count:

    date region count
    2021-09-01 Murica 1
    2021-09-01 Yurop 2
    2021-09-02 Yurop 1

I know how to do those two things separately (distinct on(order_num) order by timestamp_updated desc) to deduplicate and then select count(*) group by date, region ) And even together with subqueries. But I'd like to try to avoid subqueries as much as possible and here's where window functions (seem) to come in handy and I don't know much anything about those.

The closest thing I've been able to get are groups, but they show one record by each order_num. The records are correct, but they are duplicated:

select distinct on (order_num) date, region, count(1)over (
    partition by order_num
)
from orders_table
order by order_num, timestamp_updated desc;

That query ^^ shows:

date region count
2021-09-01 Murica 1 I think this is the first 001
2021-09-01 Murica 1 I think this is the second 001
2021-09-01 Yurop 2 I think this is the first Yurop: 002
2021-09-01 Yurop 2 I think this is the second Yurop: 003
2021-09-02 Yurop 1

CodePudding user response:

You could get the max timestamp_updatedper per order_num, date, region and then aggregate again to get the counts per date, region using window function

select distinct 
       date, 
       region, 
       count(max(timestamp_updated)) over (partition by date, region) as counts 
from t
group by order_num, date, region;

DEMO

  • Related