I have a table where I track the duration of watched films by a user for each day. Now I would like to calculate a unique view count based on date.
So the conditions are:
- For each user max view count is 1
- View = 1 if one user's SUM(duration) >= 120
- Date should be fixed once SUM(duration) reaches 120
But the issue is here to get a correct date row. For example row1.duration row2.duration >= 120
and thus view count = 1 should be applied for 2021-10-16
| id | user_id | duration | created_at | film_id |
---- --------- ---------- ------------ ---------
| 1 | 1 | 80 | 2021-10-15 | 1 |
| 2 | 1 | 70 | 2021-10-16 | 1 |
| 3 | 1 | 200 | 2021-10-17 | 1 |
| 4 | 2 | 50 | 2021-10-18 | 1 |
| 5 | 2 | 90 | 2021-10-18 | 1 |
| 6 | 3 | 140 | 2021-10-18 | 2 |
| 7 | 4 | 10 | 2021-10-19 | 3 |
Expected result:
| cnt | created_at |
------- ------------
| 0 | 2021-10-15 |
| 1 | 2021-10-16 |
| 0 | 2021-10-17 |
| 2 | 2021-10-18 |
| 0 | 2021-10-19 |
This is what I tried, but it choses first date, and ignores 0 count. Here is the fiddle with populated data
SELECT count(*) AS cnt,
created_at
FROM
(SELECT user_id,
sum(duration) AS total,
created_at
FROM watch_time
GROUP BY user_id) AS t
WHERE t.total >= 120
GROUP BY created_at;
Is there any chance to have this work via SQL or it's should be done in application level?
Thanks in advance!
Update:
Version: AWS RDS MySQL 5.7.33
But I'm ok to switch to Postgres if that can help.
Much appreciated even there is a way to have MIN(date) but with the all dates(included 0 views).
Better than this one.
SELECT IFNULL(cnt, 0) as cnt,
t3.created_at
FROM
(SELECT count(*) AS cnt,
created_at
FROM
(SELECT user_id,
sum(duration) AS total,
created_at
FROM watch_time
GROUP BY user_id) AS t
WHERE t.total >= 120
GROUP BY created_at) AS t2
RIGHT JOIN
(SELECT distinct(created_at)
FROM watch_time) AS t3
ON t2.created_at = t3.created_at;
which returns:
| cnt | created_at |
------- ------------
| 1 | 2021-10-15 |
| 0 | 2021-10-16 |
| 0 | 2021-10-17 |
| 2 | 2021-10-18 |
| 0 | 2021-10-19 |
But I'm not sure whether the date(2021-10-15) has taken randomly or its always the lowest date
CodePudding user response:
Using MySQL variables, it can implement your count logic, it basically orders the table rows by user_id and created_at, and calculate row by row
http://sqlfiddle.com/#!9/569088/14
SELECT created_at, SUM(CASE WHEN duration >= 120 THEN 1 ELSE 0 END) counts
FROM (
SELECT user_id, created_at,
CASE WHEN @UID != user_id THEN @SUM_TIME := 0 WHEN @SUM_TIME >= 120 AND @DT != created_at THEN @SUM_TIME := 0 - duration ELSE 0 END SX,
@SUM_TIME := @SUM_TIME duration AS duration,
@UID := user_id,
@DT := created_at
FROM watch_time
JOIN ( SELECT @SUM_TIME :=0, @DT := NOW(), @UID := '' ) t
ORDER BY user_id, created_at
) f
GROUP BY created_at
CodePudding user response:
I think I misunderstood the requirement in my first attempt.
Second attempt
MySql >= 8.0 (or Postgresl
) using window functions
I know you are working with MySql 5.7
, I add an answer for it next.
I am not sure if I understand correctly your requirement. Do you want the cumulative sum of time watch by user and the first time some user exceed 119 minutes count one that day?
First, I get cumulative sum by user (cte
subquery) ordered by date. In subquery cte1
with a CASE statement I set one the first time a user reach 120 minutes (view
column). Finally I group by created_at (date) and count() ones in view
column:
WITH cte AS (SELECT *, SUM(duration) OVER (PARTITION BY user_id ORDER BY created_at ASC, film_id) as cum_duration
FROM watch_time),
cte1 AS (SELECT *, CASE WHEN cum_duration >= 120 AND COALESCE(LAG(cum_duration) OVER (PARTITION BY user_id ORDER BY created_at ASC), 0) < 120 THEN 1 END AS view
FROM cte)
SELECT created_at, COUNT(view) AS cnt
FROM cte1
GROUP BY created_at;
created_at | cnt |
---|---|
2021-10-15 | 0 |
2021-10-16 | 1 |
2021-10-17 | 0 |
2021-10-18 | 2 |
2021-10-19 | 0 |
MySql 5.7
I get the cumulative sum for each user and filter cumulative duration >= 120, then I group by user_id
and get MIN(created_at)
. Finally I group by min_created_at
and count records.
SELECT min_created_at AS date, count(*) AS cnt
FROM (SELECT user_id, MIN(created_at) AS min_created_at
FROM (SELECT wt1.user_id, wt1.created_at, SUM(wt2.duration) AS cum_duration
FROM (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt1
INNER JOIN (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt2 ON wt1.user_id = wt2.user_id AND wt1.created_at >= wt2.created_at
GROUP BY wt1.user_id, wt1.created_at
HAVING SUM(wt2.duration) >= 120) AS sq
GROUP BY user_id) AS sq2
GROUP BY min_created_at;
date | cnt |
---|---|
2021-10-16 | 1 |
2021-10-18 | 2 |
You can JOIN my query (RIGHT JOIN) with the original table (GROUP BY created_at) to get the rest of the dates with count equal to 0.
First attempt
I understood that you want count one each time a user reach 120 minutes per day.
First, I get the total movie watch time by user and date (subquery sq), then with a CASE
statement I set one each time a user in a date exceed 119 minutes, I group by created_at
(date) and count()
ones in CASE
statement:
SELECT created_at, COUNT(CASE WHEN total_duration >= 120 THEN 1 END) cnt
FROM (SELECT created_at, user_id, SUM(duration) AS total_duration
FROM watch_time
GROUP BY created_at, user_id) AS sq
GROUP BY created_at;
Output (with sample data from the question):
reated_at | cnt |
---|---|
2021-10-15 | 0 |
2021-10-16 | 0 |
2021-10-17 | 1 |
2021-10-18 | 2 |
2021-10-19 | 0 |