Home > Back-end >  How can I write a SQL query that combines versions where nothing has changed?
How can I write a SQL query that combines versions where nothing has changed?

Time:01-27

I'm trying to solve a problem where I have a table with different versions of an item every time the item is updated.

Person Elig for Disc A Elig for Disc B Version Start Version End
Bob Y Y 2022-01-01 2022-01-04
Bob Y N 2022-01-05 2022-01-13
Bob N N 2022-01-14 2022-01-22
Bob Y N 2022-01-23 3000-12-31

For this problem, I don't care about whether this person is eligible for discount B, I'm only interested in discount A. What I would like is to come up with a query that only returns a new version every time the person's eligibility for discount A comes changes.

What I would like is for the following to be returned:

person Elig for Disc A Version Start Version End
Bob Y 2022-01-01 2022-01-13
Bob N 2022-01-14 2022-01-22
Bob Y 2022-01-23 3000-12-31

In this example, the first two lines have been combined because the value for Eligibility of discount A did not change. In the example I am working with, we also can have many different people in the table.

I have tried seeing if grouping by Elig for Disc A would work, but then I get rows 1, 2, and 4 all combining.

Is this something that is possible to do with SQL?

CodePudding user response:

Or using MATCH_RECOGNIZE:

with data(Person, Elig_for_Disc_A, Elig_for_Disc_B, Version_Start, Version_End) as
(
    select 'Bob', 'Y', 'Y', to_date('2022-01-01', 'yyyy-mm-dd'), to_date('2022-01-04', 'yyyy-mm-dd') from dual union all
    select 'Bob', 'Y', 'N', to_date('2022-01-05', 'yyyy-mm-dd'), to_date('2022-01-13', 'yyyy-mm-dd') from dual union all
    select 'Bob', 'N', 'N', to_date('2022-01-14', 'yyyy-mm-dd'), to_date('2022-01-22', 'yyyy-mm-dd') from dual union all
    select 'Bob', 'Y', 'N', to_date('2022-01-23', 'yyyy-mm-dd'), to_date('3000-12-31', 'yyyy-mm-dd') from dual -- union all
)
select Person, Elig_for_Disc_A, Version_Start, Version_End 
from data
match_recognize (
    partition by person, Elig_for_Disc_A 
    order by Version_Start, Version_End
    measures first(Version_Start) as Version_Start, max(Version_End) as Version_End
    pattern( merged* strt )
    define
        merged as max(Version_End) 1 >= next(Version_Start)
)
order by person, Version_Start;

    Bob Y   01/01/22    13/01/22
    Bob N   14/01/22    22/01/22
    Bob Y   23/01/22    31/12/00

CodePudding user response:

Here is an example of the query you could use in Oracle:

SELECT person, Elig_for_Disc_A, Version_Start, Version_End
FROM (
    SELECT person, Elig_for_Disc_A, Version_Start, Version_End,
           LAG(Elig_for_Disc_A) OVER (PARTITION BY person ORDER BY Version_Start) as prev_Elig_Disc_A
    FROM your_table
)
WHERE Elig_for_Disc_A != prev_Elig_Disc_A OR prev_Elig_Disc_A IS NULL

CodePudding user response:

Looking for changes is easy with LAG or LEAD. The real trick is refactoring your start/end dates to cover the intermediate time periods. Try something like this (not tried, may need to debug):

  SELECT person,
         elig_for_disc_a,
         new_version_start,
         NVL(LEAD(new_version_start) OVER (PARTITION BY person ORDER BY version_start),old_version_end) new_version_end
    FROM (SELECT person,
                 new_version_start,
                 MAX(version_end) old_version_end,
                 MAX(elig_for_disc_a) elig_for_disc_a
            FROM (SELECT x.*,
                         MAX(start_of_elig_change) OVER (PARTITION BY person ORDER BY version_start ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) new_version_start
                    FROM (SELECT x.*,
                                 DECODE(last_elig_a,elig_for_disc_a,NULL,version_start) start_of_elig_change
                            FROM (SELECT x.*,
                                         LAG(elig_for_disc_a) OVER (PARTITION BY person ORDER BY version_start) last_elig_a
                                    FROM your_table x) x) x) x
         GROUP BY person,
                  new_version_start)

CodePudding user response:

This is a typical gaps-and-islands problem, in which you need to identify islands (records that should go together) within the same partitions. We can find gaps between islands following these three steps:

  • for each Person, not flagging a change of partition when previous Version_End = Version_Start - 1 day and previous Elig_for_Disc_A has same value as current
  • computing a running sum on the flag, to craft the new partitions
  • aggregating on MIN(Version_Start), MAX(Version_End) on each new partition.
WITH cte AS (
    SELECT tab.*, 
           CASE WHEN LAG(Version_End)     OVER w = Version_Start -1
                 AND LAG(Elig_for_Disc_A) OVER w = Elig_for_Disc_A 
                THEN 0 ELSE 1 
           END AS change_part
    FROM tab
    WINDOW w AS (PARTITION BY Person ORDER BY Version_Start)
), cte2 AS (
    SELECT cte.*, 
           SUM(change_part) OVER(PARTITION BY Person ORDER BY Version_Start) AS parts
    FROM cte
)
SELECT Person, 
       Elig_for_Disc_A, 
       MIN(Version_Start) AS Version_Start,
       MAX(Version_End) AS Version_End
FROM cte2
GROUP BY Person, 
         Elig_for_Disc_A,
         Parts

Output:

PERSON ELIG_FOR_DISC_A VERSION_START VERSION_END
Bob Y 01-JAN-22 13-JAN-22
Bob N 14-JAN-22 22-JAN-22
Bob Y 23-JAN-22 31-DEC-00

Check the Oracle demo here.

Assumption: You're using the DATE datatype for storing your dates.

CodePudding user response:

You can use 1 case statement for correct version_end date and for rest 1 sub-query with LAG window function will work -

SELECT PERSON, ELIG_FOR_DISC_A, ELIG_FOR_DISC_B, VERSION_START,
       upd_version_end version_end
  FROM (SELECT d.*, LEAD(Elig_for_Disc_A) OVER(PARTITION BY Person ORDER BY Version_Start) prev_Elig_Disc_A,
                    CASE WHEN Elig_for_Disc_A = LEAD(Elig_for_Disc_A) OVER(PARTITION BY Person ORDER BY Version_Start)
                              THEN LEAD(version_end) OVER(PARTITION BY Person ORDER BY Version_Start)
                         ELSE version_end
                    END UPD_VERSION_END
         FROM DATA d
       )
WHERE Elig_for_Disc_A != prev_Elig_Disc_A OR prev_Elig_Disc_A IS NULL;

Demo.

  • Related