Home > Back-end >  Calculating yearly active users in SQL
Calculating yearly active users in SQL

Time:10-13

I have a table logins with two columns: user_id and login_date. I would like to calculate the number of active users for each calendar year using the definition that for any given day, a user_id is said to be active on that date if it has had at least 1 login in the past 90 days. For example, if a user_id has a login_date of 2017-01-01, then this user_id is said to be active for every day from 2017-01-01 to 2017-04-01. The same user_id can have another login_date on 2017-02-01, and subsequently will be active for every day from 2017-02-01 to 2017-05-01. Using this definition, I would like to calculate the number of active users in 2017, 2018, 2019, and 2020.

Here's the input table, with a few examples:

 ----------- ------------ 
|  user_id  | login_date |
 ----------- ------------ 
|  0000000  | 2017-01-01 |
|  0000000  | 2017-02-01 |
|  0000001  | 2017-01-02 |
 ----------- ------------ 

The logic I have tried but do not think is correct:

For each login, make a column for each month the user will be active (3 months):

WITH all_missions AS (
SELECT
  user_id,
  format_datetime(login_date, 'yyyy-MM') AS first_active_date,
  format_datetime(date_add('month', 1, login_date), 'yyyy-MM-dd') AS active_m1,
  format_datetime(date_add('month', 2, login_date), 'yyyy-MM-dd') AS active_m2,
  format_datetime(date_add('month', 3, login_date), 'yyyy-MM-dd') AS active_m3
FROM logins
),

Unpivot columns into rows:

active_months AS (
SELECT
  l.user_id,
  t2.active_month,
  t2.month_number
FROM logins l
CROSS JOIN unnest (
  array['active_m1', 'active_m2', 'active_m3'],
  array[active_m1, active_m2, active_m3]
) t2 (active_month, month_number)
),

Then count the distinct user count by only taking the year of each active month and aggregate:

SELECT
  substring(month_number, 1, 4) AS year_number,
  count(DISTINCT user_id) AS num_active_users
FROM active_months
GROUP BY 1

What is the correct way to approach this?

CodePudding user response:

Since you are only interested in the number of yearly active users and not monthly active users, instead of determining active_m1, active_m2, active_m3 (all active months) simply determine active_m3. Having the first_active_date and active_m3 is sufficient to determine whether the user was active within the current year or the next year.

You may try the following:

WITH all_missions AS (
    SELECT
        user_id,
        format_datetime(login_date, 'yyyy-MM') AS active_date
    FROM 
        logins
    UNION ALL
    SELECT
        user_id,
        format_datetime(date_add('month', 3, login_date), 'yyyy-MM-dd') AS active_date
     FROM logins
)
SELECT
  YEAR(active_date) AS year_number,
  count(DISTINCT user_id) AS num_active_users
FROM all_missions
GROUP BY 1

Let me know if this works for you.

  • Related