SQL for selecting YTD active member by month-CodePudding

I need some help in writing a query which will return the distinct count of memberid who are active YTD and by each month

i.e.

202201 - distinct memberid who are active in 202201

202202 - distinct memberid who are active between 202201 - 202202

202203 - distinct memberid who are active between 202201 - 202203

the data structure are similar to below.

memberid	yearmonth	activestatus
1	202201	Y
1	202202	Y
1	202203	N
2	202201	N
2	202202	N
2	202203	Y
3	202201	N
3	202202	Y
3	202203	Y

Thanks you.

Expected:

yearmonth	active_status
202201	1
202202	2
202203	3

CodePudding user response：

We're looking for a distinct running count. We can achieve it by using row_number() and count().

select distinct yearmonth                 
      ,count(rn) over(order by yearmonth) as distinct_running_count
from
(
select *
      ,case when row_number() over(partition by memberid order by yearmonth) = 1 then 1 end as rn
from   t
where  activestatus = 'Y'
) t

yearmonth	distinct_running_count
202201	1
202202	2
202203	3

Fiddle

CodePudding user response：

We can express the logic neatly with two levels of aggregation. This might be more efficient than filtering with row_number():

select yearmonth, sum(count(*)) over(order by yearmonth) running_cnt
from (
    select min(yearmonth) yearmonth from mytable where activestatus = 'Y' group by memberid
) t
group by yearmonth

This first gets the earliest active date per member, and then aggregates and computes the running sum.

In a normal situation you would also have a calendar table that holds all possible yearmonth values, and you would use it like so:

select c.yearmonth, sum(count(t.yearmonth)) over(order c.by yearmonth) running_cnt
from calendar c
left join (
    select min(yearmonth) yearmonth from mytable where activestatus = 'Y' group by memberid
) t on t.yearmonth = c.yearmonth 
group by c.yearmonth