I have a table logins
with two columns: user_id
and login_date
. I would like to calculate the number of active users for each calendar year using the definition that for any given day, a user_id
is said to be active on that date if it has had at least 1 login in the past 90 days. For example, if a user_id
has a login_date
of 2017-01-01
, then this user_id
is said to be active for every day from 2017-01-01
to 2017-04-01
. The same user_id
can have another login_date
on 2017-02-01
, and subsequently will be active for every day from 2017-02-01
to 2017-05-01
. Using this definition, I would like to calculate the number of active users in 2017, 2018, 2019, and 2020.
Here's the input table, with a few examples:
----------- ------------
| user_id | login_date |
----------- ------------
| 0000000 | 2017-01-01 |
| 0000000 | 2017-02-01 |
| 0000001 | 2017-01-02 |
----------- ------------
The logic I have tried but do not think is correct:
For each login, make a column for each month the user will be active (3 months):
WITH all_missions AS (
SELECT
user_id,
format_datetime(login_date, 'yyyy-MM') AS first_active_date,
format_datetime(date_add('month', 1, login_date), 'yyyy-MM-dd') AS active_m1,
format_datetime(date_add('month', 2, login_date), 'yyyy-MM-dd') AS active_m2,
format_datetime(date_add('month', 3, login_date), 'yyyy-MM-dd') AS active_m3
FROM logins
),
Unpivot columns into rows:
active_months AS (
SELECT
l.user_id,
t2.active_month,
t2.month_number
FROM logins l
CROSS JOIN unnest (
array['active_m1', 'active_m2', 'active_m3'],
array[active_m1, active_m2, active_m3]
) t2 (active_month, month_number)
),
Then count the distinct user count by only taking the year of each active month and aggregate:
SELECT
substring(month_number, 1, 4) AS year_number,
count(DISTINCT user_id) AS num_active_users
FROM active_months
GROUP BY 1
What is the correct way to approach this?
CodePudding user response:
Since you are only interested in the number of yearly active users and not monthly active users, instead of determining active_m1, active_m2, active_m3
(all active months) simply determine active_m3
. Having the first_active_date
and active_m3
is sufficient to determine whether the user was active within the current year or the next year.
You may try the following:
WITH all_missions AS (
SELECT
user_id,
format_datetime(login_date, 'yyyy-MM') AS active_date
FROM
logins
UNION ALL
SELECT
user_id,
format_datetime(date_add('month', 3, login_date), 'yyyy-MM-dd') AS active_date
FROM logins
)
SELECT
YEAR(active_date) AS year_number,
count(DISTINCT user_id) AS num_active_users
FROM all_missions
GROUP BY 1
Let me know if this works for you.