I'm trying to solve a problem where I have a table with different versions of an item every time the item is updated.
Person | Elig for Disc A | Elig for Disc B | Version Start | Version End |
---|---|---|---|---|
Bob | Y | Y | 2022-01-01 | 2022-01-04 |
Bob | Y | N | 2022-01-05 | 2022-01-13 |
Bob | N | N | 2022-01-14 | 2022-01-22 |
Bob | Y | N | 2022-01-23 | 3000-12-31 |
For this problem, I don't care about whether this person is eligible for discount B, I'm only interested in discount A. What I would like is to come up with a query that only returns a new version every time the person's eligibility for discount A comes changes.
What I would like is for the following to be returned:
person | Elig for Disc A | Version Start | Version End |
---|---|---|---|
Bob | Y | 2022-01-01 | 2022-01-13 |
Bob | N | 2022-01-14 | 2022-01-22 |
Bob | Y | 2022-01-23 | 3000-12-31 |
In this example, the first two lines have been combined because the value for Eligibility of discount A did not change. In the example I am working with, we also can have many different people in the table.
I have tried seeing if grouping by Elig for Disc A would work, but then I get rows 1, 2, and 4 all combining.
Is this something that is possible to do with SQL?
CodePudding user response:
Or using MATCH_RECOGNIZE:
with data(Person, Elig_for_Disc_A, Elig_for_Disc_B, Version_Start, Version_End) as
(
select 'Bob', 'Y', 'Y', to_date('2022-01-01', 'yyyy-mm-dd'), to_date('2022-01-04', 'yyyy-mm-dd') from dual union all
select 'Bob', 'Y', 'N', to_date('2022-01-05', 'yyyy-mm-dd'), to_date('2022-01-13', 'yyyy-mm-dd') from dual union all
select 'Bob', 'N', 'N', to_date('2022-01-14', 'yyyy-mm-dd'), to_date('2022-01-22', 'yyyy-mm-dd') from dual union all
select 'Bob', 'Y', 'N', to_date('2022-01-23', 'yyyy-mm-dd'), to_date('3000-12-31', 'yyyy-mm-dd') from dual -- union all
)
select Person, Elig_for_Disc_A, Version_Start, Version_End
from data
match_recognize (
partition by person, Elig_for_Disc_A
order by Version_Start, Version_End
measures first(Version_Start) as Version_Start, max(Version_End) as Version_End
pattern( merged* strt )
define
merged as max(Version_End) 1 >= next(Version_Start)
)
order by person, Version_Start;
Bob Y 01/01/22 13/01/22
Bob N 14/01/22 22/01/22
Bob Y 23/01/22 31/12/00
CodePudding user response:
Here is an example of the query you could use in Oracle:
SELECT person, Elig_for_Disc_A, Version_Start, Version_End
FROM (
SELECT person, Elig_for_Disc_A, Version_Start, Version_End,
LAG(Elig_for_Disc_A) OVER (PARTITION BY person ORDER BY Version_Start) as prev_Elig_Disc_A
FROM your_table
)
WHERE Elig_for_Disc_A != prev_Elig_Disc_A OR prev_Elig_Disc_A IS NULL
CodePudding user response:
Looking for changes is easy with LAG or LEAD. The real trick is refactoring your start/end dates to cover the intermediate time periods. Try something like this (not tried, may need to debug):
SELECT person,
elig_for_disc_a,
new_version_start,
NVL(LEAD(new_version_start) OVER (PARTITION BY person ORDER BY version_start),old_version_end) new_version_end
FROM (SELECT person,
new_version_start,
MAX(version_end) old_version_end,
MAX(elig_for_disc_a) elig_for_disc_a
FROM (SELECT x.*,
MAX(start_of_elig_change) OVER (PARTITION BY person ORDER BY version_start ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) new_version_start
FROM (SELECT x.*,
DECODE(last_elig_a,elig_for_disc_a,NULL,version_start) start_of_elig_change
FROM (SELECT x.*,
LAG(elig_for_disc_a) OVER (PARTITION BY person ORDER BY version_start) last_elig_a
FROM your_table x) x) x) x
GROUP BY person,
new_version_start)
CodePudding user response:
This is a typical gaps-and-islands problem, in which you need to identify islands (records that should go together) within the same partitions. We can find gaps between islands following these three steps:
- for each Person, not flagging a change of partition when previous Version_End = Version_Start - 1 day and previous Elig_for_Disc_A has same value as current
- computing a running sum on the flag, to craft the new partitions
- aggregating on MIN(Version_Start), MAX(Version_End) on each new partition.
WITH cte AS (
SELECT tab.*,
CASE WHEN LAG(Version_End) OVER w = Version_Start -1
AND LAG(Elig_for_Disc_A) OVER w = Elig_for_Disc_A
THEN 0 ELSE 1
END AS change_part
FROM tab
WINDOW w AS (PARTITION BY Person ORDER BY Version_Start)
), cte2 AS (
SELECT cte.*,
SUM(change_part) OVER(PARTITION BY Person ORDER BY Version_Start) AS parts
FROM cte
)
SELECT Person,
Elig_for_Disc_A,
MIN(Version_Start) AS Version_Start,
MAX(Version_End) AS Version_End
FROM cte2
GROUP BY Person,
Elig_for_Disc_A,
Parts
Output:
PERSON | ELIG_FOR_DISC_A | VERSION_START | VERSION_END |
---|---|---|---|
Bob | Y | 01-JAN-22 | 13-JAN-22 |
Bob | N | 14-JAN-22 | 22-JAN-22 |
Bob | Y | 23-JAN-22 | 31-DEC-00 |
Check the Oracle demo here.
Assumption: You're using the DATE
datatype for storing your dates.
CodePudding user response:
You can use 1 case statement for correct version_end date and for rest 1 sub-query with LAG window function will work -
SELECT PERSON, ELIG_FOR_DISC_A, ELIG_FOR_DISC_B, VERSION_START,
upd_version_end version_end
FROM (SELECT d.*, LEAD(Elig_for_Disc_A) OVER(PARTITION BY Person ORDER BY Version_Start) prev_Elig_Disc_A,
CASE WHEN Elig_for_Disc_A = LEAD(Elig_for_Disc_A) OVER(PARTITION BY Person ORDER BY Version_Start)
THEN LEAD(version_end) OVER(PARTITION BY Person ORDER BY Version_Start)
ELSE version_end
END UPD_VERSION_END
FROM DATA d
)
WHERE Elig_for_Disc_A != prev_Elig_Disc_A OR prev_Elig_Disc_A IS NULL;