I've got raw data from table with information about clients. Information comes from different sources, so it causes duplicates but with different dates:
id pp type start_dt end_dt
100| 1 | Y | 01.05.19 | 01.10.20
100| 1 | Y | 10.08.20 | 01.10.20
100| 1 | N | 01.10.20 | 02.12.21
100| 1 | N | 13.12.20 | 02.12.21
100| 1 | Y | 02.12.21 | 02.12.26
100| 1 | Y | 20.12.21 | 20.12.26
For example, in this table row 2, 4 and 6 have start date within "start_dt" and "end_dt" of previous row. It's a duplicate, but I need to combine min start date and max end date from both rows for type.
FYI. First two rows and last two rows have same id, pp and type, but I need to stack them separately because of the timeline.
What I want to get (continuous timeline for a client is a key):
id pp type start_dt end_dt | cnt
100| 1 | Y | 01.05.19 | 01.10.20 | 2
100| 1 | N | 01.10.20 | 02.12.21 | 2
100| 1 | Y | 02.12.21 | 20.12.26 | 2
I'm using PL/SQL. I think it could be solved by window functions, but I can't figure out which functions to use.
Tried to solve it by group by while having > 1, but in this case it stacks four rows with same type (rows 1,2 and 5,6) into one. I need two separate rows for each type while saving continuous timeline of dates for one client.
CodePudding user response:
From Oracle 12, you can use MATCH_RECOGNIZE
for row-by-row pattern matching:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY id, pp
ORDER BY start_dt
MEASURES
FIRST(type) AS type,
FIRST(start_dt) AS start_dt,
MAX(end_dt) AS end_dt,
COUNT(*) AS cnt
PATTERN (overlapping* last_row)
DEFINE
overlapping AS type = NEXT(type)
AND MAX(end_dt) >= NEXT(start_dt)
)
Which, for the sample data:
CREATE TABLE table_name (id, pp, type, start_dt, end_dt) AS
SELECT 100, 1, 'Y', DATE '2019-05-01', DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2020-08-10', DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 100, 1, 'N', DATE '2020-10-01', DATE '2021-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'N', DATE '2020-12-13', DATE '2021-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2021-12-02', DATE '2026-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2021-12-20', DATE '2026-12-20' FROM DUAL;
Outputs:
ID | PP | TYPE | START_DT | END_DT | CNT |
---|---|---|---|---|---|
100 | 1 | Y | 2019-05-01 00:00:00 | 2020-10-01 00:00:00 | 2 |
100 | 1 | N | 2020-10-01 00:00:00 | 2021-12-02 00:00:00 | 2 |
100 | 1 | Y | 2021-12-02 00:00:00 | 2026-12-20 00:00:00 | 2 |
If you want to use analytic and aggregation functions then it is a bit more complicated:
SELECT id, pp, type,
MIN(start_dt) AS start_dt,
MAX(end_dt) AS end_dt,
COUNT(*) AS cnt
FROM (
SELECT id, pp, type, start_dt, end_dt,
SUM(grp_change) OVER (
PARTITION BY id, pp, type
ORDER BY start_dt
) AS grp
FROM (
SELECT t.*,
CASE
WHEN start_dt <= MAX(end_dt) OVER (
PARTITION BY id, pp, type
ORDER BY start_dt
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
THEN 0
ELSE 1
END AS grp_change
FROM table_name t
)
)
GROUP BY id, pp, type
ORDER BY id, pp, start_dt
(Untested as db<>fiddle seems to be having issues at the moment.)
CodePudding user response:
I prefer this version because comparing "type = next(type)" without "type" being in the "order by" may lead to errors.
match_recognize(
partition by id, pp, type
order by start_dt,end_dt
measures first(start_dt) as start_dt, max(end_dt) as end_dt, count(*) as n
pattern (merged* strt)
define
merged as max(end_dt) >= next(start_dt)
)