I need some help in writing a query which will return the distinct count of memberid who are active YTD and by each month
i.e.
202201 - distinct memberid who are active in 202201
202202 - distinct memberid who are active between 202201 - 202202
202203 - distinct memberid who are active between 202201 - 202203
the data structure are similar to below.
memberid | yearmonth | activestatus |
---|---|---|
1 | 202201 | Y |
1 | 202202 | Y |
1 | 202203 | N |
2 | 202201 | N |
2 | 202202 | N |
2 | 202203 | Y |
3 | 202201 | N |
3 | 202202 | Y |
3 | 202203 | Y |
Thanks you.
Expected:
yearmonth | active_status |
---|---|
202201 | 1 |
202202 | 2 |
202203 | 3 |
CodePudding user response:
We're looking for a distinct running count. We can achieve it by using row_number()
and count()
.
select distinct yearmonth
,count(rn) over(order by yearmonth) as distinct_running_count
from
(
select *
,case when row_number() over(partition by memberid order by yearmonth) = 1 then 1 end as rn
from t
where activestatus = 'Y'
) t
yearmonth | distinct_running_count |
---|---|
202201 | 1 |
202202 | 2 |
202203 | 3 |
CodePudding user response:
We can express the logic neatly with two levels of aggregation. This might be more efficient than filtering with row_number()
:
select yearmonth, sum(count(*)) over(order by yearmonth) running_cnt
from (
select min(yearmonth) yearmonth from mytable where activestatus = 'Y' group by memberid
) t
group by yearmonth
This first gets the earliest active date per member, and then aggregates and computes the running sum.
In a normal situation you would also have a calendar table that holds all possible yearmonth
values, and you would use it like so:
select c.yearmonth, sum(count(t.yearmonth)) over(order c.by yearmonth) running_cnt
from calendar c
left join (
select min(yearmonth) yearmonth from mytable where activestatus = 'Y' group by memberid
) t on t.yearmonth = c.yearmonth
group by c.yearmonth