An example table:
video | encoding | video time spent | encoding bytes | encoding bytes running sum | video time spent running sum (expected) | video time spent running sum (actual) |
---|---|---|---|---|---|---|
A | 1 | 1 | 500 | 500 | 1 | 1 |
A | 2 | 1 | 400 | 900 | 1 | 2 |
B | 3 | 2 | 300 | 1200 | 3 | 5 |
B | 4 | 2 | 200 | 1400 | 3 | 8 |
B | 5 | 2 | 100 | 1500 | 3 | 11 |
B | 6 | 2 | 100 | 1600 | 3 | 14 |
- video time spent column has how much time a video is watched; which encoding was watched is not important.
- video time spent running sum is what I am trying to get. It should only sum time spent at video level, ignoring encodings.
I want to select as much encoding bytes as possible while staying under sum of video time spent < X.
My query so far:
SELECT *
FORM (
SELECT
...,
SUM(encoding_bytes) OVER(ORDER BY encoding_bytes desc) AS encoding_bytes_running_sum,
SUM(video_time_spent) OVER (ORDER BY encoding_bytes desc) AS video_time_spent_running_sum
...
)
WHERE video_time_spent_running_sum < X
but video_time_spent_running_sum isn't smart enough to skip over other encodings within the same video. What would be the best way to do this?
the number of encodings per video is not constant.
script to create table:
SELECT
*,
SUM(encoding_bytes) OVER(
ORDER BY
encoding_bytes DESC
) AS encoding_bytes_running_sum,
SUM(video_time_spent) OVER (
ORDER BY
encoding_bytes DESC ROWS UNBOUNDED PRECEDING
) AS video_time_spent_running_sum
FROM (
VALUES
('a', 1, 1, 500),
('a', 2, 1, 400),
('b', 3, 2, 300),
('b', 4, 2, 200),
('b', 5, 2, 100),
('b', 6, 2, 100)
) AS t (video, encoding, video_time_spent, encoding_bytes)
CodePudding user response:
One way to do this is as follows (I am sure its possible to simplify); where you use the ROW_NUMBER
function to only count the first row of each video.
WITH cte AS (
SELECT
*
, SUM(encoding_bytes) OVER (ORDER BY encoding_bytes DESC) AS encoding_bytes_running_sum
--, SUM(video_time_spent) OVER (ORDER BY encoding_bytes DESC ROWS UNBOUNDED PRECEDING) AS video_time_spent_running_sum
, ROW_NUMBER() OVER (PARTITION BY video ORDER BY video, [encoding]) rn
FROM (
VALUES
('a', 1, 1, 500),
('a', 2, 1, 400),
('b', 3, 2, 300),
('b', 4, 2, 200),
('b', 5, 2, 100),
('b', 6, 2, 100)
) AS t (video, [encoding], video_time_spent, encoding_bytes)
)
SELECT video, [encoding], video_time_spent, encoding_bytes, encoding_bytes_running_sum
, SUM(CASE WHEN rn = 1 THEN video_time_spent ELSE 0 END) OVER (ORDER BY video ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) video_time_spent_running_sum
FROM cte;
This returns:
video encoding | video_time_spent | encoding_bytes | encoding_bytes_running_sum | video_time_spent_running_sum |
---|---|---|---|---|
a | 1 | 1 | 500 | 500 |
a | 2 | 1 | 400 | 900 |
b | 3 | 2 | 300 | 1200 |
b | 4 | 2 | 200 | 1400 |
b | 5 | 2 | 100 | 1600 |
b | 6 | 2 | 100 | 1600 |