Home > database >  What's the best way to calculate retention if all I have is userIds?
What's the best way to calculate retention if all I have is userIds?

Time:05-19

I am just starting to learn SQL and face the following issue:

I have a web-site with registered customers who visit it monthly. There is no subscription of any kind so I just have a table that collects events of my customers' activity (userId, action, timestamp).

I can easily check how many unique (DISTINCT userId) I have in January vs February, but what I want to learn is - how many users that were active in January (as seen by their actions in the table) are still using the site in February? so its like a churn but I can't wrap my head around whats the best approach here.

What I am thinking of as possible solution:

  1. Get list of userIds from January:
SELECT DISTINCT user_id
FROM site_activity
WHERE time_utc BETWEEN '2022/01/01' AND '2022/01/31'
  1. Get list of userIds from February
SELECT DISTINCT user_id
FROM site_activity
WHERE time_utc BETWEEN '2022/02/01' AND '2022/02/28'
  1. Somehow check for overlap and mark each February user that exists in January table as "retained" and mark anyone else as "new" or "resuming" (maybe they were active in Dec last year).
And here I thought of writing something with WHILE?

Am I on the right track here?

CodePudding user response:

If your task is only for two months (January and February), then you can use a simple JOIN, something like this:

WITH
    t AS (SELECT user_id,
                 EXTRACT(MONTH FROM time_utc) AS month
              FROM site_activity
              GROUP BY user_id, EXTRACT(MONTH FROM time_utc))
SELECT t.user_id,
       t.month,
       CASE WHEN t2.user_id IS NULL THEN 'new' ELSE 'retained' END AS status
    FROM t
    LEFT JOIN t AS t2
            ON t.user_id = t2.user_id
            AND t.month = 2 -- take Feb users
            AND t2.month = 1 -- join Jan users
    ORDER BY t.user_id, t.month;

enter image description here

Here you can see that user 1 was a 'new' in Jan but going to be 'retained' in Feb. Users 2 and 3 have no records in Jan, so they are 'new' in Feb.

Here's dbfiddle also.

If you need to do it without reference to months, then ask me, I can fix my code (or you can do it yourself, taking it as a basis). In any case, I would like to see an example of the expected result.

upd. there are PG syntax so maybe you need to change some constructions (like EXTRACT) to correct one. Or specify your DB and I'll fix it also

  • Related