I got this SCD table:
start_date | end_date | partition |
---|---|---|
2022-03-08 15:35:09.856 | 2022-03-09 14:57:36.610 | 1 |
2022-03-09 14:57:36.610 | 2022-05-18 13:26:31.195 | 2 |
2022-05-18 13:26:31.195 | 2022-08-02 10:12:02.441 | 2 |
2022-08-02 10:12:02.441 | 2022-09-01 11:10:01.019 | 2 |
2022-09-01 11:10:01.019 | 2022-09-01 11:10:20.777 | 1 |
2022-09-01 11:10:20.777 | 2022-09-01 11:21:26.526 | 1 |
I would like to know for each partition the last value of start_date and end_date of the other partition (there are only two). for the given table:
start_date | end_date | partition | max_start_date | max_end_date |
---|---|---|---|---|
2022-03-08 15:35:09.856 | 2022-03-09 14:57:36.610 | 1 | null | null |
2022-03-09 14:57:36.610 | 2022-05-18 13:26:31.195 | 2 | 2022-03-08 15:35:09.856 | 2022-03-09 14:57:36.610 |
2022-05-18 13:26:31.195 | 2022-08-02 10:12:02.441 | 2 | 2022-03-08 15:35:09.856 | 2022-03-09 14:57:36.610 |
2022-08-02 10:12:02.441 | 2022-09-01 11:10:01.019 | 2 | 2022-03-08 15:35:09.856 | 2022-03-09 14:57:36.610 |
2022-09-01 11:10:01.019 | 2022-09-01 11:10:20.777 | 1 | 2022-08-02 10:12:02.441 | 2022-09-01 11:10:01.019 |
2022-09-01 11:10:20.777 | 2022-09-01 11:21:26.526 | 1 | 2022-08-02 10:12:02.441 | 2022-09-01 11:10:01.019 |
I tried some last_value window function and didn't made it. like this: , last_value (start_date) OVER (partition by partition = '1' order by start_date asc) as last_start_date_partition , last_value (end_date) OVER (partition by partition = '1' order by end_date asc) as last_end_date_partition is there any option to inject a condition to window function and make it function that way?
CodePudding user response:
Using dense_rank
:
with cte as (
select (select sum((s1.start_date < s.start_date and s1.partition != s.partition)::int)
from scd s1) r, s.*
from scd s
),
n_part as (
select dense_rank() over (order by c.r) dr, c.* from cte c
)
select np.start_date, np.end_date, np.partition, max(np1.start_date), max(np1.end_date)
from n_part np left join n_part np1 on np1.dr = np.dr - 1
group by np.start_date, np.end_date, np.partition
order by np.start_date, np.end_date
CodePudding user response:
Using windows functions and gaps-and-islandish approach:
SELECT start_date,
end_date,
PARTITION,
max(start_date) OVER (ORDER BY grp RANGE UNBOUNDED PRECEDING EXCLUDE GROUP) max_start_date, -- use max value without current group
max(end_date) OVER (ORDER BY grp RANGE UNBOUNDED PRECEDING EXCLUDE GROUP) max_end_date -- use max value without current group
FROM
(SELECT start_date,
end_date,
PARTITION,
sum(lag) OVER (ORDER BY end_date) AS grp -- use cumulative sum to create a group
FROM
(SELECT *,
CASE
WHEN lag(PARTITION) OVER (ORDER BY end_date) != PARTITION THEN 1
ELSE 0
END lag -- use lag to determine if the partition has changed
FROM mytable) t) tt