Home > database >  How do I split start and end dates with other start and end dates?
How do I split start and end dates with other start and end dates?

Time:12-23

I've written two queries below with results that look similar to this:

table : company_changes

user_id start_at end_at company_id
189 2020-12-12 2021-03-02 88
189 2021-03-02 2050-01-01 169

table: enablement_changes

user_id start_at end_at enablement
189 2020-12-12 2021-10-15 disabled
189 2021-10-15 2050-01-01 enabled

It is important that I know when users are at a certain company_id and are either enabled or disabled.

My desired results are a table like this:

user_id start_at end_at company_id status
189 2020-12-12 2021-03-02 88 disabled
189 2021-03-02 2021-10-15 169 disabled
189 2021-10-15 2050-01-01 169 enabled

I essentially want to combine the results of those queries together. The 2050-01-01 is an arbitrary date in the future. Since the user_id has not changed status or company_id then it shows as 2050-01-01 because it is the user's present state.

Any idea how to approach this?

Here is the fiddle: enter image description here

CodePudding user response:

Lukasz solution is good.

The but will match rows where the c table end time matches the start time of e table. Often datetime ranges want to be inclusive of start but not matching on end otherwise you will get two rows.

And it will miss any joins where the c start and end after the e table rows, but you want a match of the sub set. This latter point depend on if you are doing a dense match (there are always rows for all time) or a sparse match (only rows for some of the time)

The first problem can be fixed with an extra check:

SELECT c.user_id 
       ,GREATEST(c.start_at, e.start_at) AS start_at
       ,LEAST(c.end_at, e.end_at) AS end_at
       ,c.company_id 
       ,e.status
FROM company_changes c
JOIN enablement_changes e
  ON (c.start_at BETWEEN e.start_at AND e.end_at AND c.start_at < e.end_at
    OR c.end_at BETWEEN e.start_at AND e.end_at AND c.end_at >  e.start_at )
    AND c.user_id = e.user_id
ORDER BY 1,2;

Where-as to match sparse matches you need.

SELECT c.user_id 
       ,GREATEST(c.start_at, e.start_at) AS start_at
       ,LEAST(c.end_at, e.end_at) AS end_at
       ,c.company_id 
       ,e.status
FROM company_changes c
JOIN enablement_changes e
    ON c.user_id = e.user_id
        AND (c.end_at > e.start AND c.start_at < e.end_at)
ORDER BY 1,2;

On really large tables with large ranges the later code can be costly

CodePudding user response:

If in practice you have more complicated data and there may be overlapping time intervals, for example:

table: enablement_changes

user_id start_at end_at enablement
189 2020-12-12 2021-10-15 disabled
189 2020-12-20 2021-02-10 enabled
189 2021-10-15 2050-01-01 enabled

I recommend a more complex solution:

WITH _k AS (
    SELECT 1 AS n
    UNION ALL
    SELECT 2 AS n
), _points AS (
  SELECT user_id, CASE WHEN n = 1 THEN start_at ELSE end_at END AS date_point, n
    FROM company_changes
   CROSS JOIN _k
   UNION
  SELECT user_id, CASE WHEN n = 1 THEN start_at ELSE end_at END AS date_point, n
    FROM enablement_changes
   CROSS JOIN _k
), _drank AS (
  SELECT p.user_id, p.date_point, DENSE_RANK() OVER(PARTITION BY p.user_id ORDER BY p.date_point) AS dr
    FROM _points AS p
   GROUP BY p.user_id, p.date_point
)
SELECT d1.user_id, d1.date_point AS start_at, d2.date_point AS end_at, c.company_id, MAX(s.status) AS status -- or MIN if status disabled is stronger than enabled in the same time
  FROM _drank AS d1
  JOIN _drank AS d2 ON d1.dr = d2.dr-1 AND d1.user_id = d2.user_id
  LEFT JOIN company_changes AS c    ON d1.user_id = c.user_id AND d1.date_point < c.end_at AND c.start_at < d2.date_point 
  LEFT JOIN enablement_changes AS s ON d1.user_id = s.user_id AND d1.date_point < s.end_at AND s.start_at < d2.date_point 
 GROUP BY d1.user_id, d1.date_point, d2.date_point, c.company_id
 ORDER BY 1,2,3;

enter image description here

  • Related