So I have the following data:
User ID | CreatedOn |
---|---|
A | 01.01.2023 |
A | 02.01.2023 |
A | 03.01.2023 |
A | 05.01.2023 |
A | 06.01.2023 |
A | 07.01.2023 |
A | 08.01.2023 |
B | 01.01.2023 |
B | 02.01.2023 |
B | 03.01.2023 |
B | 04.01.2023 |
B | 05.01.2023 |
B | 06.01.2023 |
B | 07.01.2023 |
I need to calculate the number of users who signed in on a 7 days consecutive period. For example, User A will not be counted since we had an interruption on the 4th but user B will be counted once. It should be any 7 day period. Can someone help me with the code? I am using SQL Server. Thanks
I tried row numbers and lag but did not know where to take it from there.
with x as (
select distinct
UserId,
CreatedOn,
LAG(CreatedOn) OVER (PARTITION BY UserId ORDER BY CreatedOn) AS PrevDate,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY CreatedOn) AS RowNumber
from table)
select
*,
DATEDIFF(day, PrevDate, CreatedOn) date_diff,
case when DATEDIFF(day, PrevDate, CreatedOn) = 1 then RowNumber else 1 end as new_row
from table
CodePudding user response:
To start, join the table to itself, where the join conditions are for the same user and rows with dates in the range for the date of the current row, back to seven days in the past:
SELECT *
FROM [table] t1
INNER JOIN [table] t2 on t2.UserID = t1.UserID
AND t2.CreatedOn <= t1.CreatedOn AND t2.CreatedOn > DATEADD(day, -7, t1.CreatedOn)
Now we want to further reduce this to only show distinct days from the second table for each source user/date (assuming it's either actually a datetime column or you have the possibility of duplicates):
SELECT DISTINCT t1.UserID, t1.CreatedOn, CAST(t2.CreatedOn AS Date) As CreatedOn As CreatedDate
FROM [table] t1
INNER JOIN [table] t2 on t2.UserID = t1.UserID
AND t2.CreatedOn <= t1.CreatedOn AND t2.CreatedOn > DATEADD(day, -7, t1.CreatedOn)
Then we can group this by user, and look for user groups having 7 records. Because we've already done a DISTINCT
, we'll also need to put the prior steps in a subquery or CTE. Finally, we need to add another DISTINCT
, so a user is not repeated for each additional qualifying group (or for each additional day in spans going longer than seven days):
SELECT DISTINCT UserID
FROM (
SELECT DISTINCT t1.UserID, t1.CreatedOn, CAST(t2.CreatedOn AS Date) As CreatedDate
FROM [table] t1
INNER JOIN [table] t2 on t2.UserID = t1.UserID
AND t2.CreatedOn <= t1.CreatedOn AND t2.CreatedOn > DATEADD(day, -7, t1.CreatedOn)
) t
GROUP BY UserID, CreatedOn
HAVING COUNT(CreatedDate) = 7
See it work here:
If you check that link, note how I formatted the date literals. Different cultures, languages, and dialects have their own expectations around how dates are formatted, and the SQL language is no different. Regardless of how your own culture expects to use dates, when communicating with a database in it's own SQL dialect, you should use it's own format. For SQL Server that format is the unseparated yyyyMMdd
. Using dd.MM.yyyy
is wrong at this level, even if its what you and your users will eventually expect to see.
CodePudding user response:
Here is a way to do it using window functions like LAG()
and SUM()
:
You have a gaps and islands problem, you could use the difference between row_number
and the running total of the differences of dates to give a unique id to each successive group of rows.
with cte as (
select UserId,
CreatedOn,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY CreatedOn) as rn,
LAG(CreatedOn) OVER (PARTITION BY UserId ORDER BY CreatedOn) AS PrevDate
from mytable
),
cte2 as (
select *, COALESCE(DATEDIFF(day, PrevDate, CreatedOn), 1) as date_diff
from cte
),
cte3 as (
select *, rn - sum(date_diff) OVER (PARTITION BY UserId ORDER BY CreatedOn) as grp
from cte2
)
select UserId
from cte3
group by UserId, grp
having count(*) >= 7