Home > Enterprise >  Group Rows Over Multiple Date Ranges
Group Rows Over Multiple Date Ranges

Time:09-17

I'm trying to solve what I believe to be (at least) two problems at once with this one, so there may already exist answers that partially answer this question, but I do not fundamentally understand the concept of what I'm trying to do with the data to get it into the final form that I desire.

The problem: I have three grains of data (let's just call them Gas, Liquid, Solid) that occur over a larger period of time, let's call that the Observation Period. The ask is for GasPeriod data to be displayed for the observation period if it is present, then Liquid, then Solid. For a given state, there would only ever be 0 or 1 active records.

Gas  X----X       X----X    X---------X 1
Liq  X-------X    X--------XX---------X 2
Sol  X--------------XX----------------X 3

Need 1----12-23--31----12--21---------1

What I need is to reduce those 8 ranges (3 GasPeriod, 3 LiquidPeriod, 2 SolidPeriod) into 6 rows with 6 date ranges and the data from the "winning" row to persist the PeriodTemp and Description for the given overlapping period of time.

Any solutions would be helpful, but I'd greatly appreciate any breakdowns of the actual problem here as well so I can educate myself on what I'm doing. I suspect the problem steps involved are:

  • Return 8 rows for the Observation Period (must involve a union on GasPeriod, LiquidPeriod and SolidPeriod?)
  • Group the rows by date range and rank them (I've seen solutions that involved LAG and DENSE_RANK but I'm not familiar with the latter and I've only used LAG to solve a different problem so it's confusing me a bit how it works.
  • Pick the winning record for the given date range (I assume this is solved using a sub-query)

Edited for clarity

create table ObservationPeriod (
    ObservationPeriodId    BIGINT    IDENTITY (1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

create table GasPeriod (
    GasPeriodId            BIGINT    IDENTITY (1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    PeriodTemp             DECIMAL (11, 5) NOT NULL,
    Description            NVARCHAR(100) NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

create table LiquidPeriod (
    LiquidPeriodId         BIGINT    IDENTITY (1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    PeriodTemp             DECIMAL (11, 5) NOT NULL,
    Description            NVARCHAR(100) NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

create table SolidPeriod (
    SolidPeriodId          BIGINT    IDENTITY (1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    PeriodTemp             DECIMAL (11, 5) NOT NULL,
    Description            NVARCHAR(100) NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

create table ObservationPeriodObserved (
    ObservationPeriodObservedId BIGINT IDENTITY(1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    PeriodTemp             DECIMAL (11, 5) NOT NULL,
    Description            NVARCHAR(100) NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

ObservationPeriod Data

ObservationPeriodId BusinessKey Effective Expiry
1 24 2021-01-01 2021-12-31

GasPeriod Data

GasPeriodId BusinessKey PeriodTemp Description Effective Expiry
1 24 101.328 first g 2020-09-30 2021-03-31
2 24 102.456 second g 2021-06-01 2021-07-31
3 24 100.011 third g 2021-09-01 9999-12-31

LiquidPeriod Data

LiquidPeriodId BusinessKey PeriodTemp Description Effective Expiry
1 24 98.99 first l 2021-01-01 2021-04-30
2 24 98.76 second l 2021-06-01 2021-08-31
3 24 99.978 third l 2021-09-01 9999-12-31

SolidPeriod Data

SolidPeriodId BusinessKey PeriodTemp Description Effective Expiry
1 24 -0.145 first s 2021-01-01 2021-06-30
2 24 -0.987 second s 2021-07-01 9999-12-31

ObvservationPeriodObserved Data

ObvservationPeriodObservedIdId BusinessKey PeriodTemp Description Effective Expiry
1 24 101.328 first g 2021-01-01 2021-03-31
2 24 98.99 first l 2021-04-01 2021-04-30
3 24 -0.145 first s 2021-05-01 2021-05-31
4 24 102.456 second g 2021-06-01 2021-07-31
5 24 98.76 second l 2021-08-01 2021-08-31
6 24 100.011 third g 2021-09-01 2021-12-31

The idea is that for a given row in ObservationPeriod, there are many associated periods of time between the three aforementioned grains, but only one should ever be recorded as a subset of an ObservationPeriod for a given period of time.

Please also assume that there must be separate granularity here and that this problem cannot be solved away by getting this data into the same table -- it cannot. I can't use the actual business model here so I'm trying to get as close as I can conceptually.

CodePudding user response:

The following approach starts by creating a union of the three datasets Gas, Liquid and Solid. In this union an addition column is created PeriodPriority which will assist in choosing the winning row. I've interpreted the winning row as a period entry occurring within the observational period, is the most recent and has not expired and will be chosen based on the rank of Gas-1, Liquid-2 and Solid-3. This forms the basis of the DENSE_RANK window function as it is ordered by recent expired date and the PeriodPriority. Since this winning record may have a date exceeding the observational period, I used a case expression to ensure that the value inserted was within the observational period.

Although there is only one Observational period I have still included the where clause WHERE op.ObservationPeriodId=1 which you may update/remove as desired. I also joined on BusinessKey as I am not sure whether this will change throughout your entire set. If BusinessKey never changes, then this may be omitted from the join expression.

The resulting query currently looks like this

SELECT 
    ROW_NUMBER() OVER (ORDER BY c.Effective, c.Expiry ) as ObservationPeriodObservedId,
    c.BusinessKey,
    c.PeriodTemp, 
    c.Description,
    c.Effective,
    CASE 
        WHEN c.Expiry >= op.Expiry THEN op.Expiry
        ELSE c.Expiry
    END as Expiry
FROM (
    SELECT 
        *,
        DENSE_RANK() OVER (ORDER BY t.Expiry DESC, t.PeriodPriority ) rk
    FROM ( 
        SELECT 
            BusinessKey,PeriodTemp, Description,Effective,Expiry, 1 as PeriodPriority 
        FROM GasPeriod
        UNION ALL
        SELECT 
            BusinessKey,PeriodTemp, Description,Effective,Expiry, 2 
        FROM LiquidPeriod
        UNION ALL
        SELECT 
            BusinessKey,PeriodTemp, Description,Effective,Expiry, 3 
        FROM SolidPeriod
    ) t
) c
INNER JOIN ObservationPeriod op ON c.BusinessKey=op.BusinessKey AND
                                   op.Effective <= c.Effective AND
                                   (op.Expiry >= c.Expiry OR rk=1)
WHERE op.ObservationPeriodId=1
ORDER BY c.Effective, c.Expiry

and the insert statement

INSERT INTO ObservationPeriodObserved (BusinessKey, PeriodTemp, Description, Effective, Expiry)
SELECT 
    c.BusinessKey,
    c.PeriodTemp, 
    c.Description,
    c.Effective,
    CASE 
        WHEN c.Expiry >= op.Expiry THEN op.Expiry
        ELSE c.Expiry
    END as Expiry
FROM (
    SELECT 
        *,
        DENSE_RANK() OVER (ORDER BY t.Expiry DESC, t.PeriodPriority ) rk
    FROM ( 
        SELECT 
            BusinessKey,PeriodTemp, Description,Effective,Expiry, 1 as PeriodPriority 
        FROM GasPeriod
        UNION ALL
        SELECT 
            BusinessKey,PeriodTemp, Description,Effective,Expiry, 2 
        FROM LiquidPeriod
        UNION ALL
        SELECT 
            BusinessKey,PeriodTemp, Description,Effective,Expiry, 3 
        FROM SolidPeriod
    ) t
) c
INNER JOIN ObservationPeriod op ON c.BusinessKey=op.BusinessKey AND
                                   op.Effective <= c.Effective AND
                                   (op.Expiry >= c.Expiry OR rk=1)
WHERE op.ObservationPeriodId=1
ORDER BY c.Effective, c.Expiry

generates the desired results.

View working db fiddle here

Edit 1 - Consecutive Effective, Expiry Dates

As per the updated question and comment, I have modified the above to utilize LAG along with DATE_ADD to provide consecutive dates. The Insert query (latter part is a SELECT) is included below along with an updated db fiddle which provides the desired results. The exception here is record 1 of SolidPeriod that had an end date of 2021-06-30. When this date was changed to 2021-05-31 as in your desired results, the query corrected the 1 date that was not as expected. If there are additional considerations here or there was an error in the sample data, please let me know. I made the adjustment to the sample data instead of an adhoc calculation as I could not assume some logic to make a change as such (i.e. on an arbitrary record subtract 1 month). Let me know whether this works for you and advise further.

INSERT INTO ObservationPeriodObserved (BusinessKey, PeriodTemp, Description, Effective, Expiry)

    SELECT 
        c.BusinessKey,
        c.PeriodTemp, 
        c.Description,
        CASE
            WHEN LAG(c.Expiry) OVER (ORDER BY c.Effective, c.Expiry) IS NULL THEN op.Effective
            ELSE DATEADD(DAY,1,LAG(c.Expiry) OVER (ORDER BY c.Effective, c.Expiry))
        END as Effective,
        CASE 
            WHEN c.Expiry >= op.Expiry THEN op.Expiry
            ELSE c.Expiry
        END as Expiry
        
    FROM (
        SELECT 
            *,
            DENSE_RANK() OVER (ORDER BY t.Expiry DESC, t.PeriodPriority ) rk
        FROM ( 
            SELECT 
                BusinessKey,PeriodTemp, Description,Effective,Expiry, 1 as PeriodPriority 
            FROM GasPeriod
            UNION ALL
            SELECT 
                BusinessKey,PeriodTemp, Description,Effective,Expiry, 2 
            FROM LiquidPeriod
            UNION ALL
            SELECT 
                BusinessKey,PeriodTemp, Description,Effective,Expiry, 3 
            FROM SolidPeriod
        ) t
    ) c
    INNER JOIN ObservationPeriod op ON c.BusinessKey=op.BusinessKey AND
                                       op.Effective <= c.Effective AND
                                       (op.Expiry >= c.Expiry OR rk=1)
    WHERE op.ObservationPeriodId=1
    ORDER BY c.Effective, c.Expiry 

View Demo DB Fiddle

Let me know if this works for you.

  • Related