Home > Enterprise >  How to group by continuous time periods and sum a column in sql
How to group by continuous time periods and sum a column in sql

Time:10-11

id| startdate | enddate | team | score
1| 2011-09-25 | 2012-03-31 | blue | 380
1| 2012-04-01 | 2012-07-31 | blue | 147
1| 2012-08-01 | 2013-01-31 | green| 217
1| 2013-02-01 | 2015-02-28 | green| 1540
1| 2015-03-01 | 2015-09-30 | green| 300
1| 2015-10-01 | 2018-04-30 | blue | 2193
1| 2018-05-01 | 2020-12-31 | blue | 1743
1| 2023-05-01 | 2023-09-30 | blue | 159
1| 2023-10-01 | 9999-12-31 | green| *

I need to merge the continuous time periods in this view where the team is the same and add the scores to get that team's total score in that period.
Note that, the value for the last time frame is * because it is the current time frame, and the value is only inserted after the time period is finished. Also, all the values for the score are in string format not integer format.

Expected output:

id| startdate| enddate| team| score
1| 2011-09-25| 2012-07-31| blue|  527
1| 2012-08-01| 2015-09-30| green| 2057
1| 2015-10-01| 2020-12-31| blue| 3963
1| 2023-05-01| 2023-09-30| blue| 159
1| 2023-10-01| 9999-12-31| green| *

CodePudding user response:

This is a gaps and islands problem, you may try the following using ROW_NUMBER function to define groups for the consecutive teams, then using DATEDIFF define another groups within a team to check if there is break in the dates.

WITH grps AS
  (
  SELECT *, 
      ROW_NUMBER() OVER (PARTITION BY id ORDER BY startdate) - 
      ROW_NUMBER() OVER (PARTITION BY id, team ORDER BY startdate) grp
  FROM table_name
  )
SELECT id, MIN(startdate) startdate, MAX(enddate) enddate, team, 
       CASE WHEN MAX(score) = '*' 
            THEN '*'
            ELSE SUM(score) 
       END AS score
FROM
  (
    SELECT *, COALESCE(DATEDIFF(startdate, LAG(enddate) OVER (PARTITION BY grp ORDER BY enddate)), 1) grp2
    FROM grps
  ) T
GROUP BY id, team, grp, grp2 
ORDER BY id, startdate

See a demo.

  • Related