I have an table with columns id
, score
, parent_id
ordered by the score
. I have ask one scenario in here.
The different from previous question is that the parent_id might show up on multiple rows not necessary sequence rows. The updated table:
id | score | parent_id |
---|---|---|
5859 | 10 | 5859 |
2157043 | 9 | 5859 |
21064154 | 8 | 21064154 |
51992 | 7 | 51992 |
34384599 | 6 | 51992 |
1675761 | 5 | 5859 |
3465729 | 4 | 3465729 |
401202 | 3 | 401203 |
1817458 | 2 | 1817458 |
I want to query all columns from this table with the same order but limit results at least 5 rows to meet the unique parent_id number equal to 5. As result, the parent_id only contains 5 ids: 5859, 21064154, 51992, 3465729, 401203
Expected Results like:
id | score | parent_id |
---|---|---|
5859 | 10 | 5859 |
2157043 | 9 | 5859 |
21064154 | 8 | 21064154 |
51992 | 7 | 51992 |
34384599 | 6 | 51992 |
1675761 | 5 | 5859 |
3465729 | 4 | 3465729 |
401202 | 3 | 401203 |
The solution using lag
only works for the following row has same value of parent_id. If use Java, we could use a SET to store the parent_id and keep count the unique parent_id, but how do we write in SQL?
select id, score, parent_id
from (
select *, Sum(diff) over(order by score desc)seq
from (
select *,
case when Lag(parent_id) over(order by score desc) = parent_id then 0 else 1 end diff
from t
)t
)d
where seq <= 5
order by score desc;
CodePudding user response:
Perhaps this will work for you, a refactor of your existing query but an alternative to lag using exists which caters for non-sequential rows, give this a try:
select id, score, parent_id
from (
select *, Sum(keep) over (order by score desc) seq
from (
select *,
case when exists (
select * from t t2
where t2.parent_id = t.parent_id and t2.score > t.score
) then 0 else 1 end keep
from t
)t
)s
where seq <= 5
order by score desc;
CodePudding user response:
One method to consider might be:
SELECT t1.*
FROM t t1
WHERE
(SELECT count(distinct parent_id)
FROM t t2
WHERE t2.score >= t1.score) <= 5
You can see a Fiddle here. The WHERE
clause is simply counting the number of distinct parent_id
s with a score
>= the current row.