Home > Back-end >  How to combine and sum consequent values until new value in column
How to combine and sum consequent values until new value in column

Time:10-27

I need some help with summing subsequent values of a column based on category in another column, until that category reaches new value. Here's what my data looks like

id | site_id | date_id | hour_id | location_id | status   | status_minutes
 1      1      20210101     1           1        Offline          60
 2      1      20210101     2           1        Offline          57
 3      1      20210101     2           1        Available        3
 4      1      20210101     3           1        Available        20
 5      1      20210101     3           1        Offline          40
...    ...        ...      ...         ...         ...            ...
25      1      20210101     23          1        Offline          60
26      1      20210102     0           1        Offline          23 

As you can see in the above data is at hourly level, and so if status minutes column equals to 60, it'll be just one row for that hour. However, if not, then status minutes will be spread across rows that would add up to 60, as you can see in rows 2 and 3, and in rows 4 and 5.

Now, my goal is to understand stretches of time of how long each status was going on, until next status kicked in. So the output for the example above would be:

site_id | date_id | location_id | status   | status_minutes
   1      20210101       1        Offline          117
   1      20210101       1        Available        23
   1      20210101       1        Offline          40
  ...        ...        ...         ...            ...
   1      20210101       1        Offline          60
   1      20210102       1        Offline          23   

Important part is that this operation should only be confined within each day, as seen in the last two rows of example and the output. So the summing happens only within a given day, and then starts again with the 0th hour next day.

CodePudding user response:

This is a gaps and island problem. The section_num is being used to determine groups before finding the total status_minutes.

You may try the following:

SELECT
    site_id,
    date_id,
    location_id,
    status,
    SUM(status_minutes) as status_minutes
FROM (
    SELECT 
        *,
        ROW_NUMBER() OVER (
           PARTITION BY site_id,date_id,location_id
           ORDER BY hour_id
        ) - ROW_NUMBER() OVER (
           PARTITION BY site_id,date_id,location_id,status
           ORDER BY hour_id
        ) as section_num
    FROM
        my_table
) t
GROUP BY
    site_id,
    date_id,
    location_id,
    status,
    section_num
ORDER BY
    site_id,
    date_id,
    location_id,
    section_num

View working demo on db fiddle

  • Related