Home > Blockchain >  sql - filter a subset based on order of a column and group by
sql - filter a subset based on order of a column and group by

Time:11-25

I am looking for a SQL query that given a table with data

 ---------------- -------------- --- 
|category        |id            |time|
 ---------------- -------------- --- 
|A               |abc|              1  |
|A               |abc|              1  |
|B               |abc|              3  |
|C               |abc|              4  |
|A               |xyz|              4  |
|B               |xyz|              5  |
|C               |xyz|              7  |
|C               |xyz|              7  |
 ---------------- --------------- --- 

will result in output

 ---------------- --------------- --- 
|category        |id             |cnt|
 ---------------- --------------- --- 
|A               |xyz|              1  |
|B               |xyz|              1  |
|C               |xyz|              2  |
 ---------------- --------------- --- 

I get up to this:

select category, id, count(*) as cnt
from table
group by category, id

Now, I need to know which id is more recent (greater) based on the time column, so I can filter only this subset. time column can only increase.

Is there better alternative than

select category, id, count(*) as cnt
from table
where id=(select id from table order by time desc limit 1)
group by category, id

?

CodePudding user response:

with data(category, id, time) as (
select * from values
    ('A', 'abc', 1),
    ('A', 'abc', 1),
    ('B', 'abc', 3),
    ('C', 'abc', 4),
    ('A', 'xyz', 4),
    ('B', 'xyz', 5),
    ('C', 'xyz', 7),
    ('C', 'xyz', 7)
)
select 
    category, 
    id, 
    count(*) as cnt 
from data
where id = (select id from data order by time desc limit 1)
group by 1,2;

works as you note:

CATEGORY ID CNT
A xyz 1
B xyz 1
C xyz 2

another method is to convert that WHERE into a JOIN:

select 
    d.category, 
    d.id, 
    count(*) as cnt 
from data as d
join (
    select z.category, z.id 
    from data as z 
    qualify row_number() over (partition by z.category order by z.time desc) = 1
) as c
    on d.category = c.category and d.id = c.id
group by 1,2;

I would be inclined to test on your data, to see which works better for you

  • Related