Consider the mysql table,
user_event| created_at
1 2021-12-03
2 2021-12-03
3 2021-12-03
3 2021-12-01
1 2021-11-28
2 2021-11-10
2 2021-11-01
1 2021-11-01
I am trying to find count of users grouped by created_at & count of repeat users if they had an event in the last 10 days
The expected result would be:
created_at | total_users | repeat_users
2021-12-03 3 2
2021-12-02 0 0
2021-12-01 1 0
.
.
2021-11-28 1 0
.
.
2021-11-10 1 1
2021-11-01 2 0
Basically, user_event 1,2,3 had activities on 2021-12-03, so total_users are 3. 2 of them had activity in the last 10 days so, repeat_users should be 2. Similarly, user_event 3 was present on 2021-12-01 but he has no activity in last 10 days, hence repeat_users would be 0 and so on.
CodePudding user response:
Given your table of data
CREATE TABLE MyData
(
user_event INT,
created_at DATE
);
INSERT INTO MyData
VALUES
(1,'2021-12-03'),
(2,'2021-12-03'),
(3,'2021-12-03'),
(3,'2021-12-01'),
(1,'2021-11-28'),
(2,'2021-11-10'),
(2,'2021-11-01'),
(1,'2021-11-01')
You can solve this using a calendar table which can be created as follows (you probably should populate with dates well into the future)
delimiter //
CREATE PROCEDURE CreateDateTable ()
BEGIN
SET @StartDate = CAST('2021-01-01' AS DATE);
SET @EndDate = CAST('2021-12-31' AS DATE);
CREATE TABLE Calendar
(
DateValue DATE
);
WHILE @StartDate <= @EndDate DO
INSERT INTO Calendar
(
DateValue
)
VALUES
(
@StartDate
);
SET @StartDate = DATE_ADD(@StartDate, INTERVAL 1 DAY);
END WHILE;
END//
CALL CreateDateTable
DROP PROCEDURE CreateDateTable
Once you have the calendar table, you can select the dates you are interested in and use a combination of COUNT / GROUP BY to count the number of user events for the day and then join to a subquery which uses a self join to find the number of repeat events:
SELECT c.DateValue,
COUNT(d.user_event) AS total_users,
IFNULL(r.NumRepeats,0) AS repeat_users
FROM Calendar c
LEFT JOIN MyData d
ON c.DateValue = d.created_at
LEFT JOIN (
/* repeats */
SELECT a.created_at,
COUNT(a.user_event) AS NumRepeats
FROM MyData a
LEFT JOIN MyData b
ON b.created_at > DATE_ADD(a.created_at, INTERVAL -10 DAY) AND
b.created_at < a.created_at AND
a.user_event = b.user_event
WHERE b.user_event IS NOT NULL
GROUP BY a.created_at
) r
ON r.created_at = c.DateValue
WHERE c.DateValue >= '2021-11-01' AND
c.DateValue <= '2021-12-03'
GROUP BY c.DateValue,
r.NumRepeats
ORDER BY c.DateValue DESC