Im trying to create a query that retuns missing dates between two columns and multiple rows.
Example: leases
move_in move_out hotel_id
2021-04-01 2021-04-14 1
2021-04-17 2021-04-30 1
2021-04-01 2021-04-14 2
2021-04-17 2021-04-30 2
Result should be
date hotel_id
2021-04-15 1
2021-04-16 1
2021-04-15 2
2021-04-16 2
CodePudding user response:
If you're using postgresql 14 you can use multirange
s to do this:
CREATE TEMP TABLE t (
"move_in" DATE,
"move_out" DATE,
"hotel_id" INTEGER
);
INSERT INTO t
("move_in", "move_out", "hotel_id")
VALUES ('2021-04-01', '2021-04-14', '1')
, ('2021-04-17', '2021-04-30', '1')
, ('2021-05-03', '2021-05-30', '1') -- added this as a test case
, ('2021-04-01', '2021-04-14', '2')
, ('2021-04-17', '2021-04-30', '2');
SELECT hotel_id, datemultirange(DATERANGE(MIN(move_in), MAX(move_out))) - range_agg(DATERANGE(move_in, move_out, '[]')) AS r
FROM t
GROUP BY hotel_id
returns
-------- -------------------------------------------------
|hotel_id|r |
-------- -------------------------------------------------
|2 |{[2021-04-14,2021-04-17)} |
|1 |{[2021-04-14,2021-04-17),[2021-04-30,2021-05-03)}|
-------- -------------------------------------------------
If you want to have 1 row per day you can use unnest
and generate_series
to expand the multirange
s:
WITH available_ranges AS(
SELECT hotel_id, unnest(datemultirange(DATERANGE(MIN(move_in), MAX(move_out), '[]')) - range_agg(DATERANGE(move_in, move_out, '[]'))) AS r
FROM t
GROUP BY hotel_id
)
SELECT hotel_id, generate_series(lower(r), upper(r) - 1, '1 day'::interval)
FROM available_ranges
ORDER BY 1, 2
;
returns
-------- ---------------------------------
|hotel_id|generate_series |
-------- ---------------------------------
|1 |2021-04-15 00:00:00.000000 00:00|
|1 |2021-04-16 00:00:00.000000 00:00|
|1 |2021-05-01 00:00:00.000000 00:00|
|1 |2021-05-02 00:00:00.000000 00:00|
|2 |2021-04-15 00:00:00.000000 00:00|
|2 |2021-04-16 00:00:00.000000 00:00|
-------- ---------------------------------
CodePudding user response:
You're finding the difference between two sets. One is the leased hotel days. The other is all days in the month of April. And you're doing this for all hotels.
We can make a set of all days of April for all hotels. First we need to build the set of all days in the month of April: generate_series('2022-04-01'::date, '2022-04-30'::date, '1 day')
.
Then we need to cross join this with all hotel IDs.
select *
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
Now for each day we can left join this with the leases for that day.
select *
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
Any days without a lease won't have a lease.id
, so filter on that.
select day, hotels.id
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
left join leases on day between move_in and leases.move_out and hotel_id = hotels.id
where leases.id is null
order by hotels.id, day