Home > OS >  Expand on limit SQL query results use lag() window function
Expand on limit SQL query results use lag() window function

Time:01-11

I have an table with columns id, score, parent_id ordered by the score. I have ask one scenario in here.

The different from previous question is that the parent_id might show up on multiple rows not necessary sequence rows. The updated table:

id score parent_id
5859 10 5859
2157043 9 5859
21064154 8 21064154
51992 7 51992
34384599 6 51992
1675761 5 5859
3465729 4 3465729
401202 3 401203
1817458 2 1817458

I want to query all columns from this table with the same order but limit results at least 5 rows to meet the unique parent_id number equal to 5. As result, the parent_id only contains 5 ids: 5859, 21064154, 51992, 3465729, 401203

Expected Results like:

id score parent_id
5859 10 5859
2157043 9 5859
21064154 8 21064154
51992 7 51992
34384599 6 51992
1675761 5 5859
3465729 4 3465729
401202 3 401203

The solution using lag only works for the following row has same value of parent_id. If use Java, we could use a SET to store the parent_id and keep count the unique parent_id, but how do we write in SQL?

select id, score, parent_id
from (
  select *, Sum(diff) over(order by score desc)seq
  from (
     select *, 
       case when Lag(parent_id) over(order by score desc) = parent_id then 0 else 1 end diff
    from t
  )t
)d
where seq <= 5
order by score desc;

CodePudding user response:

Perhaps this will work for you, a refactor of your existing query but an alternative to lag using exists which caters for non-sequential rows, give this a try:

select id, score, parent_id
from (
  select *, Sum(keep) over (order by score desc) seq
  from (
    select *, 
      case when exists (
        select * from t t2 
        where t2.parent_id = t.parent_id and t2.score > t.score
      ) then 0 else 1 end keep
      from t
  )t
)s
where seq <= 5
order by score desc;

CodePudding user response:

One method to consider might be:

SELECT t1.*
FROM t t1
WHERE
 (SELECT count(distinct parent_id)
 FROM t t2
 WHERE t2.score >= t1.score) <= 5

You can see a Fiddle here. The WHERE clause is simply counting the number of distinct parent_ids with a score >= the current row.

  • Related