Calculating active users within a 7 days period-CodePudding

So I have the following data:

User ID	CreatedOn
A	01.01.2023
A	02.01.2023
A	03.01.2023
A	05.01.2023
A	06.01.2023
A	07.01.2023
A	08.01.2023
B	01.01.2023
B	02.01.2023
B	03.01.2023
B	04.01.2023
B	05.01.2023
B	06.01.2023
B	07.01.2023

I need to calculate the number of users who signed in on a 7 days consecutive period. For example, User A will not be counted since we had an interruption on the 4th but user B will be counted once. It should be any 7 day period. Can someone help me with the code? I am using SQL Server. Thanks

I tried row numbers and lag but did not know where to take it from there.

with x as (
    select distinct 
        UserId, 
        CreatedOn,
        LAG(CreatedOn) OVER (PARTITION BY UserId ORDER BY CreatedOn) AS PrevDate,
        ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY CreatedOn) AS RowNumber
    from table)
    select 
        *, 
        DATEDIFF(day, PrevDate, CreatedOn) date_diff, 
        case when DATEDIFF(day, PrevDate, CreatedOn) = 1 then RowNumber else 1 end as new_row
    from table

CodePudding user response：

To start, join the table to itself, where the join conditions are for the same user and rows with dates in the range for the date of the current row, back to seven days in the past:

SELECT *
FROM [table] t1
INNER JOIN [table] t2 on t2.UserID = t1.UserID 
    AND t2.CreatedOn <= t1.CreatedOn AND t2.CreatedOn > DATEADD(day, -7, t1.CreatedOn)

Now we want to further reduce this to only show distinct days from the second table for each source user/date (assuming it's either actually a datetime column or you have the possibility of duplicates):

SELECT DISTINCT t1.UserID, t1.CreatedOn, CAST(t2.CreatedOn AS Date) As CreatedOn As CreatedDate
FROM [table] t1
INNER JOIN [table] t2 on t2.UserID = t1.UserID 
    AND t2.CreatedOn <= t1.CreatedOn AND t2.CreatedOn > DATEADD(day, -7, t1.CreatedOn)

Then we can group this by user, and look for user groups having 7 records. Because we've already done a DISTINCT, we'll also need to put the prior steps in a subquery or CTE. Finally, we need to add another DISTINCT, so a user is not repeated for each additional qualifying group (or for each additional day in spans going longer than seven days):

SELECT DISTINCT UserID
FROM (
    SELECT DISTINCT t1.UserID, t1.CreatedOn, CAST(t2.CreatedOn AS Date) As CreatedDate
    FROM [table] t1
    INNER JOIN [table] t2 on t2.UserID = t1.UserID 
        AND t2.CreatedOn <= t1.CreatedOn AND t2.CreatedOn > DATEADD(day, -7, t1.CreatedOn)
) t
GROUP BY UserID, CreatedOn
HAVING COUNT(CreatedDate) = 7

See it work here:

https://dbfiddle.uk/fDnoLpLK

If you check that link, note how I formatted the date literals. Different cultures, languages, and dialects have their own expectations around how dates are formatted, and the SQL language is no different. Regardless of how your own culture expects to use dates, when communicating with a database in it's own SQL dialect, you should use it's own format. For SQL Server that format is the unseparated yyyyMMdd. Using dd.MM.yyyy is wrong at this level, even if its what you and your users will eventually expect to see.

CodePudding user response：

Here is a way to do it using window functions like LAG() and SUM() :

You have a gaps and islands problem, you could use the difference between row_number and the running total of the differences of dates to give a unique id to each successive group of rows.

with cte as (
  select  UserId, 
        CreatedOn,
        ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY CreatedOn) as rn,
        LAG(CreatedOn) OVER (PARTITION BY UserId ORDER BY CreatedOn) AS PrevDate
  from mytable
),
cte2 as (
  select *, COALESCE(DATEDIFF(day, PrevDate, CreatedOn), 1) as date_diff
  from cte
),
cte3 as ( 
  select *, rn - sum(date_diff) OVER (PARTITION BY UserId ORDER BY CreatedOn) as grp
  from cte2
)
select UserId
from cte3
group by UserId, grp
having count(*) >= 7

Demo here