Have a table given, holding the status history and creation of a user:
ID | USERID | STATUS | USER_CREATED | TIMESTAMP |
---|---|---|---|---|
1 | A | active | 01.02.2020 | 01.02.2020, 08:00:00 |
2 | A | inactive | 01.02.2020 | 02.02.2020, 11:00:00 |
3 | A | active | 01.02.2020 | 03.02.2020, 12:00:00 |
4 | B | active | 02.02.2020 | 02.02.2020, 13:00:00 |
5 | B | inactive | 02.02.2020 | 05.02.2020, 14:00:00 |
6 | B | active | 02.02.2020 | 07.02.2020, 16:00:00 |
7 | C | active | 02.02.2020 | 02.02.2020, 09:00:00 |
8 | C | inactive | 03.02.2020 | 03.02.2020, 11:00:00 |
9 | D | active | 03.02.2020 | 03.02.2020, 10:00:00 |
10 | E | active | 04.02.2020 | 04.02.2020, 10:00:00 |
I am working on a statistics overview for user creation, which works on a provided min/max date.
Desired result for create date: 01.02.2020 to 03.02.2020:
Create Date | Users created | Still active today | Total active users at create date |
---|---|---|---|
01.02.2020 | 1 | 1 | 1 |
02.02.2020 | 2 | 1 | 2 |
03.02.2020 | 1 | 1 | 3 |
The first two columns, Users created
and Still active today
are straight forward, I simply use a group by and select max(id) to get the latest status of every userid.
The challenge however comes with the last column, Total active users at create date
.
SELECT USER_CREATED as "Create Date",
count(*) as "Users created",
sum(case when STATUS = 'active' then 1 else 0 end) as "Still active today",
??? as "Total active users at create date"
FROM myTable t1
WHERE t1.USER_CREATED >= to_date('01.02.2020', 'DD.MM.YYYY') and t1.USER_CREATED <= to_date('03.02.2020', 'DD.MM.YYYY')
AND and t1.ID = (select max(ID) from myTable where USERID = t1.USERID)
GROUP BY t1.USER_CREATED;
For the column Total active users at create date
, I am a bit clueless. The point is I have to make sure that the timestamp is below the user_created field in my counting. I.e. on create date 02.02.01, only timestamps below 02.02.01 must be considered. More precisely, the timestamp CLOSEST to the create date tells the actual status of a user.
Current approach (not working):
SELECT USER_CREATED as "Create Date",
count(*) as "Users created",
sum(case when STATUS = 'active' then 1 else 0 end) as "Still active today",
(select count(*) from (select USERID, max(STATUS) keep (dense_rank first order by USER_CREATED desc) as STATUS
from myTable t2
where t2.timestamp <= t1.timestamp
group by USERID) where STATUS = 'active')
as "Total active users at create date"
FROM myTable t1
WHERE t1.USER_CREATED >= to_date('01.02.2020', 'DD.MM.YYYY') and t1.USER_CREATED <= to_date('03.02.2020', 'DD.MM.YYYY')
AND and t1.ID = (select max(ID) from myTable where USERID = t1.USERID)
GROUP BY t1.USER_CREATED;
I need to Any help is appreciated
CodePudding user response:
I'd use CROSS APPLY
rather than a correlated sub-query, it's just a bit easier to read.
Then, the sub-query would need to get every users' last event preceding the end of the day in question (for which I chose to use ROW_NUMBER()
). Then count the number of users who's last event is an active
event.
WITH
dates AS
(
SELECT
User_Created,
COUNT(*) AS count_created,
SUM(CASE WHEN STATUS = 'active' THEN 1 ELSE 0 END) AS count_still_active
FROM
(
SELECT
myTable.*,
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY Timestamp DESC) AS event_id
FROM
myTable
)
sorted
WHERE
User_Created BETWEEN DATE'2020-02-01' AND DATE'2020-02-03'
AND event_id = 1
GROUP BY
User_Created
)
SELECT
*
FROM
dates
OUTER APPLY
(
SELECT
COUNT(*) AS active_users_at_end_of_day
FROM
(
SELECT
myTable.*,
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY Timestamp DESC) AS event_id
FROM
myTable
WHERE
myTable.Timestamp < dates.User_Created 1
)
filtered_sorted
WHERE
event_id = 1
AND status = 'active'
)
daily_stats
ORDER BY
User_Created
Demo...
CodePudding user response:
You can use:
WITH expected_range (min_dt, max_dt) AS (
SELECT DATE '2020-02-01',
DATE '2020-02-04'
FROM DUAL
),
calendar (dt, max_dt) AS (
SELECT MIN(LEAST(TRUNC(timestamp), min_dt)),
MAX(GREATEST(TRUNC(timestamp), max_dt))
FROM myTable
CROSS JOIN expected_range
UNION ALL
SELECT dt INTERVAL '1' DAY, max_dt
FROM calendar
WHERE dt INTERVAL '1' DAY <= max_dt
),
activity (dt, userid, status, rn, day_rn, status_now, day_status) AS (
SELECT c.dt,
a.userid,
status,
ROW_NUMBER() OVER (PARTITION BY userid ORDER BY timestamp NULLS LAST),
ROW_NUMBER() OVER (PARTITION BY userid, c.dt ORDER BY timestamp DESC),
MAX(status)
KEEP (DENSE_RANK LAST ORDER BY timestamp NULLS FIRST)
OVER (PARTITION BY userid),
COALESCE(
LAST_VALUE(status)
IGNORE NULLS
OVER (PARTITION BY a.userid ORDER BY c.dt, a.timestamp),
'inactive'
)
FROM calendar c
LEFT OUTER JOIN mytable a
PARTITION BY (a.userid)
ON (c.dt <= a.timestamp AND a.timestamp < c.dt INTERVAL '1' DAY)
)
SELECT a.dt AS create_date,
COUNT(CASE WHEN rn = 1 THEN 1 END) AS users_created,
COUNT(CASE WHEN rn = 1 AND status_now = 'active' THEN 1 END)
AS still_active,
COUNT(CASE WHEN day_rn = 1 AND day_status = 'active' THEN 1 END)
AS total_active_users_today
FROM activity a
INNER JOIN expected_range e
ON (a.dt BETWEEN e.min_dt AND e.max_dt)
GROUP BY a.dt
ORDER BY a.dt
Which, for the sample data (with additional test cases):
CREATE TABLE myTable (ID, USERID, STATUS, TIMESTAMP) AS
SELECT 1, 'A', 'active', TO_DATE('01.02.2020 08:00:00', 'DD.MM.YYYY HH24:MI:SS') FROM DUAL UNION ALL
SELECT 2, 'A', 'inactive', TO_DATE('02.02.2020 11:00:00', 'DD.MM.YYYY HH24:MI:SS') FROM DUAL UNION ALL
SELECT 3, 'A', 'active', TO_DATE('03.02.2020 12:00:00', 'DD.MM.YYYY HH24:MI:SS') FROM DUAL UNION ALL
SELECT 4, 'B', 'active', TO_DATE('02.02.2020 13:00:00', 'DD.MM.YYYY HH24:MI:SS') FROM DUAL UNION ALL
SELECT 5, 'B', 'inactive', TO_DATE('05.02.2020 14:00:00', 'DD.MM.YYYY HH24:MI:SS') FROM DUAL UNION ALL
SELECT 6, 'B', 'active', TO_DATE('07.02.2020 16:00:00', 'DD.MM.YYYY HH24:MI:SS') FROM DUAL UNION ALL
SELECT 7, 'C', 'active', TO_DATE('02.02.2020 09:00:00', 'DD.MM.YYYY HH24:MI:SS') FROM DUAL UNION ALL
SELECT 8, 'C', 'inactive', TO_DATE('03.02.2020 11:00:00', 'DD.MM.YYYY HH24:MI:SS') FROM DUAL UNION ALL
SELECT 9, 'D', 'active', TO_DATE('03.02.2020 10:00:00', 'DD.MM.YYYY HH24:MI:SS') FROM DUAL UNION ALL
SELECT 10, 'E', 'active', TO_DATE('04.02.2020 10:00:00', 'DD.MM.YYYY HH24:MI:SS') FROM DUAL UNION ALL
SELECT 11, 'F', 'active', TO_DATE('31.01.2020 00:00:00', 'DD.MM.YYYY HH24:MI:SS') FROM DUAL UNION ALL
SELECT 12, 'G', 'active', TO_DATE('04.02.2020 00:00:00', 'DD.MM.YYYY HH24:MI:SS') FROM DUAL UNION ALL
SELECT 13, 'G', 'inactive', TO_DATE('04.02.2020 23:59:59', 'DD.MM.YYYY HH24:MI:SS') FROM DUAL;
Outputs:
CREATE_DATE USERS_CREATED STILL_ACTIVE TOTAL_ACTIVE_USERS_TODAY 2020-02-01 00:00:00 1 1 2 2020-02-02 00:00:00 2 1 3 2020-02-03 00:00:00 1 1 4 2020-02-04 00:00:00 2 1 5
db<>fiddle here