Home > Enterprise >  Postgres Find missing dates between a dataset of two columns
Postgres Find missing dates between a dataset of two columns

Time:05-03

Im trying to create a query that retuns missing dates between two columns and multiple rows.

Example: leases

move_in          move_out       hotel_id
2021-04-01    2021-04-14    1
2021-04-17   2021-04-30     1
2021-04-01    2021-04-14    2
2021-04-17   2021-04-30     2

Result should be

date         hotel_id
2021-04-15   1
2021-04-16   1
2021-04-15   2
2021-04-16   2

CodePudding user response:

If you're using postgresql 14 you can use multiranges to do this:

CREATE TEMP TABLE t (
    "move_in"  DATE,
    "move_out" DATE,
    "hotel_id" INTEGER
);

INSERT INTO t
    ("move_in", "move_out", "hotel_id")
VALUES ('2021-04-01', '2021-04-14', '1')
     , ('2021-04-17', '2021-04-30', '1')
     , ('2021-05-03', '2021-05-30', '1')  -- added this as a test case
     , ('2021-04-01', '2021-04-14', '2')
     , ('2021-04-17', '2021-04-30', '2');

SELECT hotel_id, datemultirange(DATERANGE(MIN(move_in), MAX(move_out))) - range_agg(DATERANGE(move_in, move_out, '[]')) AS r
FROM t
GROUP BY hotel_id

returns

 -------- ------------------------------------------------- 
|hotel_id|r                                                |
 -------- ------------------------------------------------- 
|2       |{[2021-04-14,2021-04-17)}                        |
|1       |{[2021-04-14,2021-04-17),[2021-04-30,2021-05-03)}|
 -------- ------------------------------------------------- 

If you want to have 1 row per day you can use unnest and generate_series to expand the multiranges:

WITH available_ranges AS(
    SELECT hotel_id, unnest(datemultirange(DATERANGE(MIN(move_in), MAX(move_out), '[]')) - range_agg(DATERANGE(move_in, move_out, '[]'))) AS r
    FROM t
    GROUP BY hotel_id
)
SELECT hotel_id, generate_series(lower(r), upper(r) - 1, '1 day'::interval)
FROM available_ranges
ORDER BY 1, 2
;

returns

 -------- --------------------------------- 
|hotel_id|generate_series                  |
 -------- --------------------------------- 
|1       |2021-04-15 00:00:00.000000  00:00|
|1       |2021-04-16 00:00:00.000000  00:00|
|1       |2021-05-01 00:00:00.000000  00:00|
|1       |2021-05-02 00:00:00.000000  00:00|
|2       |2021-04-15 00:00:00.000000  00:00|
|2       |2021-04-16 00:00:00.000000  00:00|
 -------- --------------------------------- 

CodePudding user response:

You're finding the difference between two sets. One is the leased hotel days. The other is all days in the month of April. And you're doing this for all hotels.

We can make a set of all days of April for all hotels. First we need to build the set of all days in the month of April: generate_series('2022-04-01'::date, '2022-04-30'::date, '1 day').

Then we need to cross join this with all hotel IDs.

select *
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
  select distinct hotel_id as id
  from leases
) as hotels(id)

Now for each day we can left join this with the leases for that day.

select *
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
  select distinct hotel_id as id
  from leases
) as hotels(id)

Any days without a lease won't have a lease.id, so filter on that.

select day, hotels.id
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
  select distinct hotel_id as id
  from leases
) as hotels(id)
left join leases on day between move_in and leases.move_out and hotel_id = hotels.id
where leases.id is null
order by hotels.id, day

Demonstration.

  • Related