Home > OS >  Count Distinct ID group by two kind of dates
Count Distinct ID group by two kind of dates

Time:07-22

With the following table, I need to count the number of disctinct ID every month with a rolling period of 30 days, group by the month of the opening account

CREATE TABLE test (
  opening_account DATE,
  activity_date DATE,
  ID VARCHAR
);

INSERT INTO test (opening_account, activity_date,ID) VALUES 
('2022-01-01', '2022-01-01', '2DKJZINK'),
('2022-01-01', '2022-01-14', '2DKJZINK'),
('2022-01-01', '2022-01-24', '2DKJZINK'),
('2022-01-01', '2022-02-02', '2DKJZINK'),
('2022-01-01', '2022-02-04', '2DKJZINK'),
('2022-01-01', '2022-01-04', '3EFE'),
('2022-01-02', '2022-01-30', 'HZKZ'),
('2022-01-02', '2022-02-04', 'HZKZ'),
('2022-01-02', '2022-03-12', 'HZKZ'),
('2022-02-03', '2022-02-03', 'KDZL'),
('2022-02-03', '2022-03-03', 'KDZL'),
('2022-02-03', '2022-03-03', 'KDZL'),
('2022-02-12', '2022-02-14', 'ZOJZO'),
('2022-03-22', '2022-03-22', 'DZJA'),
('2022-03-22', '2022-03-22', 'DZAAA');

For example:

  • Looking at the month of January, 3 people opened an account:2DKJZINK, 3EFE, HZKZ. They all have been actived in January, so the total is 3.

  • In February, only 2DKJZINK & HZKZ have been actived, but because there is a rolling period of 30days/1month, we need to considered 3EFE which was actived the 4th of January. So the total for the cohort of January for February is 3

  • In March, only HZKZ has been actived. But because in February 2DKJZINK was also actived, the total will be 2.

This is the same process for the people that opened an account in February, then March...

Excepted result is the following. The table doesn't have to be pivoted like this, it is mostly to explain the result easier.

Month_opening_account   January_activity    February_activity   March_activity
January                        3                   3                    2
February                                           2                    2
March                                                                   2

A basic query will give me the count but without taking the rolling period of 30 days into account. I have trying to add a window function on top of it, but by doing like this, there is duplicated value.

select
 FORMAT_DATE('%B', activity_date ) AS month_activity,
 FORMAT_DATE('%B', opening_account ) as month_opening_account,
 count(distinct ID) as count_id_previous
from test
group by 1,2

Any help would be appreciated

CodePudding user response:

Consider below approach

with temp as (
  select id,
    format_date('%b', date_trunc(date(opening_account), month)) as month_opening_account,
    format_date('%b', date_trunc(date(activity_date), month)) as activity_date,
    format_date('%b', date_trunc(date_add(date(activity_date), interval 1 month), month)) as rolling_activity_date
  from your_table
)
select * from (
  select id, month_opening_account, activity_date from temp union all
  select id, month_opening_account, rolling_activity_date as activity_date from temp
)
pivot (count(distinct id) as activity for activity_date in ('Jan','Feb','Mar','Apr','May','Jun','Jul'))             

if applied to sample data in your question - output is

enter image description here

  • Related