I am trying to identify sum of quantity for each group if date ranges are overlapping. I am using postgresql to solve for it. For example
id | group | start_date | end_date | quantity |
---|---|---|---|---|
1 | a | 2020-09-11 | 2020-10-09 | 50 |
1 | a | 2020-09-11 | 2020-10-31 | 20 |
1 | a | 2020-11-01 | 2020-12-01 | 7 |
1 | a | 2020-11-15 | 2020-11-20 | 6 |
2 | b | 2020-10-06 | 2020-10-30 | 10 |
2 | b | 2020-10-09 | 2022-10-17 | 5 |
2 | b | 2020-10-15 | 2022-10-26 | 3 |
What I am trying to achieve is the following:
Expected Output: "edited"
id | group | start_date | end_date | quantity |
---|---|---|---|---|
1 | a | 2020-09-11 | 2020-10-31 | 70 |
1 | a | 2020-11-01 | 2020-12-01 | 13 |
2 | b | 2020-10-06 | 2020-10-30 | 18 |
would appreciate your help!
CodePudding user response:
SOLUTION 1 : when all the date intervals within the same group are overlapping
SELECT id
, group
, min(start_date) AS start_date
, max(end_date) AS end_date
, Sum(quantity) AS quantity
FROM mytable
GROUP BY id, group
SOLUTION 2 : when all the date intervals within the same group are not overlapping
Using a window function would be great but I don't see how to specify the window which corresponds to a subset of date ranges which overlap. So I come back to a basic solution based on a self-join which select for every row of table a the subset of rows of table b which overlap with that row :
SELECT DISTINCT ON (a.id, date_start, date_end)
a.id
, min(b.start_date) AS date_start
, max(b.end_date) AS date_end
, sum(b.quantity) AS quantity
FROM mytable AS a
INNER JOIN mytable AS b
ON a.id = b.id
AND a.start_date < b.end_date
AND a.end_date > b.start_date
GROUP BY a.id, a.start_date, a.end_date
ORDER BY a.id, date_start
see the test result in dbfiddle
CodePudding user response:
You may use the LAG
function to check the consecutive overlapped date periods and create unique groups for them as the following:
WITH create_groups AS
(
SELECT *, SUM(chk) OVER (PARTITION BY id, group_ ORDER BY start_date) grp
FROM
(
SELECT *,
CASE
WHEN start_date <= LAG(end_date, 1, end_date) OVER (PARTITION BY id, group_ ORDER BY start_date) AND
start_date >=LAG(start_date, 1, start_date) OVER (PARTITION BY id, group_ ORDER BY start_date)
THEN 0 ELSE 1
END AS chk
FROM table_name
) T
)
SELECT id, group_,
MIN(start_date) start_date,
MAX(end_date) start_date,
SUM(quantity) quantity
FROM create_groups
GROUP BY id, group_, grp
ORDER BY id, group_, grp
See a demo.