I am counting 7-day active users using T-SQL. I used the following code:
SELECT
*,
COUNT(DISTINCT [UserID]) OVER (
PARTITION BY [HospitalID], [HospitalName], [Device]
ORDER BY [Date]
ROWS 7 PRECEDING
) AS [7-Day Active Users]
FROM UserActivity
ORDER BY [HospitalID], [HospitalName], [Device], [Date]
I was told Use of DISTINCT is not allowed with the OVER clause
.
UserActivity
is a table with columns HospitalID
, HospitalName
, Device
(either phone or tablet), Date
and UserID
(could be NULL). To make things easier, I have filled the gaps between dates which made Date
consecutive so I can use ROWS 7 PRECEDING
with confidence. I did a lot of searches online and found most solution are either using other types of SQL (which is not possible in my case) or using DENSE_RANK
function which does not support a moving window. What is the correct and hopefully simpler, concise way of solving my problem?
CodePudding user response:
Sorry to see that COUNT DISTINCT
was not supported in that type of SQL... I hadn't known that. Especially after you went to the trouble of fixing the gaps between dates!
I used Rasgo to generate the SQL -- so this won't work directly in your version (tested with Snowflake), but I think it will work as long as you fix the DATEADD
function. Every RDBMS seems to do DATEADD
differently, it seems.
The general concept here is to join the data upon itself using a range join
condition in the WHERE
clause.
Luckily, this should work for you without having to fix the gaps in the dates first.
WITH BASIC_OFFSET_7DAY AS (
SELECT
A.HOSPITALNAME,
A.HOSPITALID,
A.DEVICE,
A.DATE,
COUNT(DISTINCT B.USERID) as COUNT_DISTINCT_USERID_PAST7DAY,
COUNT(1) AS AGG_ROW_COUNT
FROM
UserActivity A
INNER JOIN UserActivity B ON A.HOSPITALNAME = B.HOSPITALNAME
AND A.HOSPITALID = B.HOSPITALID
AND A.DEVICE = B.DEVICE
WHERE
B.DATE >= DATEADD(day, -7, A.DATE)
AND B.DATE <= A.DATE
GROUP BY
A.HOSPITALNAME,
A.HOSPITALID,
A.DEVICE,
A.DATE
)
SELECT
src.*,
BASIC_OFFSET_7DAY.COUNT_DISTINCT_USERID_PAST7DAY
FROM
UserActivity src
LEFT OUTER JOIN BASIC_OFFSET_7DAY ON BASIC_OFFSET_7DAY.DATE = src.DATE
AND BASIC_OFFSET_7DAY.HOSPITALNAME = src.HOSPITALNAME
AND BASIC_OFFSET_7DAY.HOSPITALID = src.HOSPITALID
AND BASIC_OFFSET_7DAY.DEVICE = src.DEVICE
Let me know how that works out and if it doesn't work I'll help you out.
Edit: For those who are trying to do this and getting stuck, a common mistake (one that I myself performed when I did this by hand) is to pay careful attention to COUNT(DISTINCT(B.col)) and not A.col. When I used Rasgo to generate the SQL to check myself, I caught my mistake. Hopefully this note helps someone in the future who makes this same mistake!