Home > Blockchain >  SELF JOIN a query to obtain the number of reactivated users
SELF JOIN a query to obtain the number of reactivated users

Time:08-21

Assume you have the table given below containing information on Facebook user logins. Write a query to obtain the number of reactivated users (which are dormant users who did not log in the previous month, who then logged in during the current month). Output the current month and number of reactivated users.

I have tried this question by first making an inner join combining a user's previous month to current month with this code.

WITH CTE as 
   (SELECT user_id, 
   EXTRACT(month from login_date) as current_month,
   EXTRACT(month from login_date)-1 as prev_month
   FROM user_logins)
SELECT a.user_id as user_id, a.current_month, a.prev_month,
    b.user_id as prev_month_user
FROM CTE a LEFT JOIN CTE b
ON a.prev_month = b.current_month

My idea is to use a case statement

    CASE WHEN a.user_id IN 
        (SELECT b.user_id
        WHERE b.current_month  = a.prev_month)
    THEN 0 ELSE 1 END

BUT that is giving me wrong output for user_id 245 in current_month 4.

https://drive.google.com/file/d/1dOQQxaJWv7j7o7M1Q98nlj77KCzIHxKl/view?usp=sharing

How to fix this?

CodePudding user response:

This gets you the first day of the current month:

select date_trunc('month', current_date)

You can add or subtract an interval of one month to get the previous or next month's starting date.

The complete query:

select *
from users
where user_id in
(
  select user_id
  from user_logins
  where login_date >= date_trunc('month', current_date)
  and login_date < date_trunc('month', current_date)   interval '1 month'
)
and user_id not in
(
  select user_id
  from user_logins
  where login_date >= date_trunc('month', current_date) - interval '1 month'
  and login_date < date_trunc('month', current_date)
)

Well, admittedly

and login_date < date_trunc('month', current_date)   interval '1 month'

is probably unnecessary here, because the table won't contain future logins :-) So, keep it or remove it, as you like.

If you want a self join, you should get distinct user/month pairs first. Then, as you want to get user/month pairs for which not exists a user/month-1 pair (and for which NOT EXISTS would be appropriate) your join must be an anti join. This means you outer join the user/month-1 pair and only keep the outer joined rows, i.e. the non-matches.

WITH cte AS 
(
  SELECT DISTINCT user_id, DATE_TRUNC('month', login_date) AS month
  FROM user_logins
)
SELECT mon.month, mon.user_id
FROM cte mon
LEFT JOIN cte prev ON prev.user_id = mon.user_id
                   AND prev.month = mon.month - INTERVAL '1 month'
WHERE prev.month IS NULL -- anti join
ORDER BY mon.month, mon.user_id;

I don't find anti joins very readable and would use NOT EXISTS instead. But that's a matter of personal preference, I guess. The query gives you all users who logged in a month, but not the previous month. You can of course limit this to the cutrent month. Or you can aggregate per month and count. Or remove the WHERE clause and count repeating users vs. new ones (COUNT(*) = all that month, COUNT(prev.month) = all repeating users, COUNT(*) - COUNT(prev.month) = all new users).

Well having said this, ... wasn't the task about reactivated users? Then you are looking for users who were active once, then paused a month, then became active again. Here is a simple query to get this for users who paused last month:

select user_id
from user_logins
group by user_id
having min(login_date) < date_trunc('month', current_date) - interval '1 month'
   and max(login_date) >= date_trunc('month', current_date)
   and count(*) filter (where login_date >= date_trunc('month', current_date) - interval '1 month'
                          and login_date < date_trunc('month', current_date)) = 0;
  • Related