I need some help with summing subsequent values of a column based on category in another column, until that category reaches new value. Here's what my data looks like
id | site_id | date_id | hour_id | location_id | status | status_minutes
1 1 20210101 1 1 Offline 60
2 1 20210101 2 1 Offline 57
3 1 20210101 2 1 Available 3
4 1 20210101 3 1 Available 20
5 1 20210101 3 1 Offline 40
... ... ... ... ... ... ...
25 1 20210101 23 1 Offline 60
26 1 20210102 0 1 Offline 23
As you can see in the above data is at hourly level, and so if status minutes column equals to 60, it'll be just one row for that hour. However, if not, then status minutes will be spread across rows that would add up to 60, as you can see in rows 2 and 3, and in rows 4 and 5.
Now, my goal is to understand stretches of time of how long each status was going on, until next status kicked in. So the output for the example above would be:
site_id | date_id | location_id | status | status_minutes
1 20210101 1 Offline 117
1 20210101 1 Available 23
1 20210101 1 Offline 40
... ... ... ... ...
1 20210101 1 Offline 60
1 20210102 1 Offline 23
Important part is that this operation should only be confined within each day, as seen in the last two rows of example and the output. So the summing happens only within a given day, and then starts again with the 0th hour next day.
CodePudding user response:
This is a gaps and island problem. The section_num
is being used to determine groups before finding the total status_minutes
.
You may try the following:
SELECT
site_id,
date_id,
location_id,
status,
SUM(status_minutes) as status_minutes
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY site_id,date_id,location_id
ORDER BY hour_id
) - ROW_NUMBER() OVER (
PARTITION BY site_id,date_id,location_id,status
ORDER BY hour_id
) as section_num
FROM
my_table
) t
GROUP BY
site_id,
date_id,
location_id,
status,
section_num
ORDER BY
site_id,
date_id,
location_id,
section_num