Get appropriate date with GROUP BY-CodePudding

I have a table where I track the duration of watched films by a user for each day. Now I would like to calculate a unique view count based on date.

So the conditions are:

For each user max view count is 1
View = 1 if one user's SUM(duration) >= 120
Date should be fixed once SUM(duration) reaches 120

But the issue is here to get a correct date row. For example row1.duration row2.duration >= 120 and thus view count = 1 should be applied for 2021-10-16

| id | user_id | duration | created_at | film_id |
 ---- --------- ---------- ------------ --------- 
| 1  | 1       | 80       | 2021-10-15 | 1       |
| 2  | 1       | 70       | 2021-10-16 | 1       |
| 3  | 1       | 200      | 2021-10-17 | 1       |
| 4  | 2       | 50       | 2021-10-18 | 1       |
| 5  | 2       | 90       | 2021-10-18 | 1       |
| 6  | 3       | 140      | 2021-10-18 | 2       |
| 7  | 4       | 10       | 2021-10-19 | 3       |

Expected result:

| cnt   | created_at |
 ------- ------------ 
| 0     | 2021-10-15 |
| 1     | 2021-10-16 |
| 0     | 2021-10-17 |
| 2     | 2021-10-18 |
| 0     | 2021-10-19 |

This is what I tried, but it choses first date, and ignores 0 count. Here is the fiddle with populated data

SELECT count(*) AS cnt,
         created_at
FROM 
    (SELECT user_id,
         sum(duration) AS total,
         created_at
    FROM watch_time
    GROUP BY  user_id) AS t
WHERE t.total >= 120
GROUP BY  created_at;

Is there any chance to have this work via SQL or it's should be done in application level?

Thanks in advance!

Update:

Version: AWS RDS MySQL 5.7.33

But I'm ok to switch to Postgres if that can help.

Much appreciated even there is a way to have MIN(date) but with the all dates(included 0 views).

Better than this one.

SELECT IFNULL(cnt, 0) as cnt,
         t3.created_at
FROM 
    (SELECT count(*) AS cnt,
         created_at
    FROM 
        (SELECT user_id,
         sum(duration) AS total,
         created_at
        FROM watch_time
        GROUP BY  user_id) AS t
        WHERE t.total >= 120
        GROUP BY  created_at) AS t2
    RIGHT JOIN 
    (SELECT distinct(created_at)
    FROM watch_time) AS t3
    ON t2.created_at = t3.created_at;

which returns:

| cnt   | created_at |
 ------- ------------ 
| 1     | 2021-10-15 |
| 0     | 2021-10-16 |
| 0     | 2021-10-17 |
| 2     | 2021-10-18 |
| 0     | 2021-10-19 |

But I'm not sure whether the date(2021-10-15) has taken randomly or its always the lowest date

CodePudding user response：

Using MySQL variables, it can implement your count logic, it basically orders the table rows by user_id and created_at, and calculate row by row

http://sqlfiddle.com/#!9/569088/14

SELECT created_at, SUM(CASE WHEN duration >= 120 THEN 1 ELSE 0 END) counts
FROM (
  SELECT user_id, created_at, 
  CASE WHEN @UID != user_id THEN @SUM_TIME := 0 WHEN @SUM_TIME >= 120 AND @DT != created_at THEN @SUM_TIME := 0 - duration ELSE 0 END SX,
  @SUM_TIME := @SUM_TIME   duration AS duration,
  @UID := user_id,
  @DT := created_at
  FROM watch_time
  JOIN ( SELECT @SUM_TIME :=0, @DT := NOW(), @UID := '' ) t
  ORDER BY user_id, created_at
) f
GROUP BY created_at

CodePudding user response：

I think I misunderstood the requirement in my first attempt.

Second attempt

MySql >= 8.0 (or Postgresl) using window functions

I know you are working with MySql 5.7, I add an answer for it next.

I am not sure if I understand correctly your requirement. Do you want the cumulative sum of time watch by user and the first time some user exceed 119 minutes count one that day?

First, I get cumulative sum by user (cte subquery) ordered by date. In subquery cte1 with a CASE statement I set one the first time a user reach 120 minutes (view column). Finally I group by created_at (date) and count() ones in view column:

WITH cte AS (SELECT *, SUM(duration) OVER (PARTITION BY user_id ORDER BY created_at ASC, film_id) as cum_duration
              FROM watch_time),
              
     cte1 AS (SELECT *, CASE WHEN cum_duration >= 120 AND COALESCE(LAG(cum_duration) OVER (PARTITION BY user_id ORDER BY created_at ASC), 0) < 120 THEN 1 END AS view 
              FROM cte)

SELECT created_at, COUNT(view) AS cnt
FROM cte1
GROUP BY created_at;

created_at	cnt
2021-10-15	0
2021-10-16	1
2021-10-17	0
2021-10-18	2
2021-10-19	0

MySql 5.7

I get the cumulative sum for each user and filter cumulative duration >= 120, then I group by user_id and get MIN(created_at). Finally I group by min_created_at and count records.

SELECT min_created_at AS date, count(*) AS cnt

FROM (SELECT user_id, MIN(created_at) AS min_created_at

    FROM (SELECT wt1.user_id, wt1.created_at, SUM(wt2.duration) AS cum_duration
        FROM (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt1
        INNER JOIN (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt2 ON wt1.user_id = wt2.user_id AND wt1.created_at >= wt2.created_at 
        GROUP BY wt1.user_id, wt1.created_at
        HAVING SUM(wt2.duration) >= 120) AS sq

    GROUP BY user_id) AS sq2
GROUP BY min_created_at;

date	cnt
2021-10-16	1
2021-10-18	2

You can JOIN my query (RIGHT JOIN) with the original table (GROUP BY created_at) to get the rest of the dates with count equal to 0.

First attempt

I understood that you want count one each time a user reach 120 minutes per day.

First, I get the total movie watch time by user and date (subquery sq), then with a CASE statement I set one each time a user in a date exceed 119 minutes, I group by created_at (date) and count() ones in CASE statement:

SELECT created_at, COUNT(CASE WHEN total_duration >= 120 THEN 1 END) cnt
FROM (SELECT created_at, user_id, SUM(duration) AS total_duration
             FROM watch_time
             GROUP BY created_at, user_id) AS sq
GROUP BY created_at;

Output (with sample data from the question):

reated_at	cnt
2021-10-15	0
2021-10-16	0
2021-10-17	1
2021-10-18	2
2021-10-19	0