Home > Back-end >  Limit SQL query results by distinct column value
Limit SQL query results by distinct column value

Time:01-10

I have an table with columns id, score, parent_id ordered by the score.

id score parent_id
5859 10 5859
2157043 9 5859
21064154 8 21064154
51992 7 51992
34384599 6 51992
1675761 5 1675761
3465729 4 3465729
401202 3 401203
1817458 2 1817458

I want to query all columns from this table with the same order but limit results at least 5 rows to meet the unique parent_id number equal to 5. As result, the parent_id only contains 5 ids: 5859, 21064154, 51992, 1675761, 3465729

Expected Results like:

id score parent_id
5859 10 5859
2157043 9 5859
21064154 8 21064154
51992 7 51992
34384599 6 51992
1675761 5 1675761
3465729 4 3465729

CodePudding user response:

One other way you could accomplish this is to use lag to indicate when the id changes and then use a cumulative sum over a window then filtering accordingly:

select id, score, parent_id
from (
  select *, Sum(diff) over(order by score desc)seq
  from (
     select *, 
       case when Lag(parent_id) over(order by score desc) = parent_id then 0 else 1 end diff
    from t
  )t
)d
where seq <= 5
order by score desc;

CodePudding user response:

I understand that you want to retain rows that belong to the top 5 scoring parent_ids.

Although a sophisticated approach based on window functions might be possible here, here is one way to do it simply using a subquery:

select *
from mytable t
where parent_id in (
    select top 5 parent_id 
    from mytable 
    group by parent_id 
    order by max(score) desc
)
order by score desc

If your data may have some score tied, consider adding option with ties to the subquery in order to guarantee a predictable result.

  • Related