I have a gaps and islands SQL problem combined with a recursion.
Vacations, bank holidays (blue) and what not are stored in the entries
table. Depending on their hierarchy they are linked to different locations (e.g. country, federal state or school). Weekends (gray) belong to the country. Normal school vacations (green) to the federal state. Special events belong to the school. They often overlap.
Assuming I want to query all entries
from the location with the id
2 and want to know how many days (duration) an entry has, I can do this with
SELECT
e.starts_on,
e.ends_on,
e.name,
(e.ends_on - e.starts_on 1) AS days,
l.name AS location_name
FROM
entries e
INNER JOIN locations l ON e.location_id = l.id
WHERE
l.id = 2;
This results in
starts_on | ends_on | name | days | location_name |
---|---|---|---|---|
2022-12-21 | 2023-01-07 | Christmas school vacation | 18 | Baden-Württemberg |
2023-01-06 | 2023-01-06 | Bank Holiday | 1 | Baden-Württemberg |
But any student wants to know the total number of days there is no school. Including adjoining weekends or bank holidays.
That could be solved with Find rows with adjourning date ranges and accumulate their durations
But I have trouble to combine that with the recursion (searching for a school but actually search for a school, a federal state and a country).
Question
How can I select the days
of the entry and the total_days
of all the adjoining entries of either the location and of all parents of the location? And can I also aggregate the entry_ids
that adjoin or overlap in one attribute too?
I want to search for the "Christmas school vacation" of the school with the id
3 and get this result:
starts_on | ends_on | name | days | location_name | total_days | aggr_location_ids | real_start | real_end |
---|---|---|---|---|---|---|---|---|
2022-12-21 | 2023-01-07 | Christmas school vacation | 18 | Baden-Württemberg | 19 | [1, 2, 3, 4, 6, 7, 8] | 2022-12-21 | 23-01-08 |
Setup with PostgreSQL
https://www.db-fiddle.com/f/jVPQoFPQw7axf7XivwpiPZ/0
CREATE TABLE locations (
id serial PRIMARY KEY,
name varchar(255) NOT NULL,
parent_id integer REFERENCES locations (id)
);
CREATE TABLE entries (
id serial PRIMARY KEY,
starts_on date NOT NULL,
ends_on date NOT NULL,
name varchar(255) NOT NULL,
location_id integer REFERENCES locations (id) NOT NULL
);
ALTER TABLE entries ADD CONSTRAINT ends_after_starts CHECK (ends_on >= starts_on);
INSERT INTO locations (name) VALUES ('Germany');
INSERT INTO locations (name, parent_id) VALUES ('Baden-Württemberg', 1);
INSERT INTO locations (name, parent_id) VALUES ('Example school', 2);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-21', '2023-01-07', 'Christmas school vacation', 2);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-25', '2022-12-26', 'Christmas', 1);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2023-01-01', '2023-01-01', 'New Year', 1);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2023-01-06', '2023-01-06', 'Bank Holiday', 2);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-17', '2022-12-18', 'Weekend', 1);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-24', '2022-12-25', 'Weekend', 1);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-31', '2023-01-01', 'Weekend', 1);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2023-01-07', '2023-01-08', 'Weekend', 1);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2023-01-14', '2023-01-15', 'Weekend', 1);
CodePudding user response:
The query below first produces the recursive lookup for all the location entries and then, in get_vacations
, searches for vacations that are adjacent to the holiday originally searched for:
with recursive cte as (
select l.*, jsonb_build_array(l.id) tree from locations l where l.parent_id is null
union all
select l.*, c.tree || jsonb_build_array(l.id) from cte c join locations l on l.parent_id = c.id
),
get_vacations(id, t, h_id, r_s, r_e) as (
select c.id, c.tree, e.id, e.starts_on, e.ends_on
from cte c join entries e on exists (select 1 from jsonb_array_elements(c.tree) v
where (v.value#>>'{}')::int = e.location_id)
where c.id = 3 and e.name = 'Christmas school vacation' -- SEARCH CRITERIA
union all
select g.id, g.t, g.h_id, least(e.starts_on, g.r_s), greatest(e.ends_on, g.r_e)
from get_vacations g join entries e on exists (select 1 from jsonb_array_elements(g.t) v
where (v.value#>>'{}')::int = e.location_id)
and e.starts_on = g.r_e or e.ends_on = g.r_s
)
select e.starts_on, e.ends_on, e.name, e.ends_on - e.starts_on days,
l.name location_name, t.r_e - t.r_s total_days, t.t aggr_location_ids,
t.r_s real_start, t.r_e real_end
from (select g.id, g.t, g.h_id, min(g.r_s) r_s, max(g.r_e) r_e
from get_vacations g group by g.id, g.t, g.h_id) t
join entries e on e.id = t.h_id
join locations l on l.id = (t.t ->> 1)::int
CodePudding user response:
Hello there is a solution using WITH RECURSIVE
WITH RECURSIVE rec( eid , lid ,parent_id )
AS (
SELECT e.id , l.id ,l.parent_id
FROM entries e
INNER JOIN locations l ON e.location_id = l.id
WHERE l.id = 2
UNION ALL
SELECT rec.eid , b.id , b.parent_id
FROM rec
JOIN locations AS b ON rec.parent_id = b.id
)
SELECT
e.id
,e.starts_on,
e.ends_on,
SUM((e.ends_on - e.starts_on 1)) AS days
FROM rec
JOIN locations AS l ON l.id = rec.lid
JOIN entries AS e ON e.id = rec.eid
GROUP BY e.id
The first query from WITH is used as a pivot and get stored inside of WITH rec(eid , lid , parent_id ) , the next query from union ALL does a SELECT using the vales from REC , uses itself to relies under recursivity.