I'm trying to solve what I believe to be (at least) two problems at once with this one, so there may already exist answers that partially answer this question, but I do not fundamentally understand the concept of what I'm trying to do with the data to get it into the final form that I desire.
The problem: I have three grains of data (let's just call them Gas, Liquid, Solid) that occur over a larger period of time, let's call that the Observation Period. The ask is for GasPeriod data to be displayed for the observation period if it is present, then Liquid, then Solid. For a given state, there would only ever be 0 or 1 active records.
Gas X----X X----X X---------X 1
Liq X-------X X--------XX---------X 2
Sol X--------------XX----------------X 3
Need 1----12-23--31----12--21---------1
What I need is to reduce those 8 ranges (3 GasPeriod, 3 LiquidPeriod, 2 SolidPeriod) into 6 rows with 6 date ranges and the data from the "winning" row to persist the PeriodTemp and Description for the given overlapping period of time.
Any solutions would be helpful, but I'd greatly appreciate any breakdowns of the actual problem here as well so I can educate myself on what I'm doing. I suspect the problem steps involved are:
- Return 8 rows for the Observation Period (must involve a union on GasPeriod, LiquidPeriod and SolidPeriod?)
- Group the rows by date range and rank them (I've seen solutions that involved
LAG
andDENSE_RANK
but I'm not familiar with the latter and I've only usedLAG
to solve a different problem so it's confusing me a bit how it works. - Pick the winning record for the given date range (I assume this is solved using a sub-query)
Edited for clarity
create table ObservationPeriod (
ObservationPeriodId BIGINT IDENTITY (1,1) NOT NULL,
BusinessKey BIGINT NOT NULL,
Effective DATETIME2 NOT NULL,
Expiry DATETIME2 NOT NULL
)
create table GasPeriod (
GasPeriodId BIGINT IDENTITY (1,1) NOT NULL,
BusinessKey BIGINT NOT NULL,
PeriodTemp DECIMAL (11, 5) NOT NULL,
Description NVARCHAR(100) NOT NULL,
Effective DATETIME2 NOT NULL,
Expiry DATETIME2 NOT NULL
)
create table LiquidPeriod (
LiquidPeriodId BIGINT IDENTITY (1,1) NOT NULL,
BusinessKey BIGINT NOT NULL,
PeriodTemp DECIMAL (11, 5) NOT NULL,
Description NVARCHAR(100) NOT NULL,
Effective DATETIME2 NOT NULL,
Expiry DATETIME2 NOT NULL
)
create table SolidPeriod (
SolidPeriodId BIGINT IDENTITY (1,1) NOT NULL,
BusinessKey BIGINT NOT NULL,
PeriodTemp DECIMAL (11, 5) NOT NULL,
Description NVARCHAR(100) NOT NULL,
Effective DATETIME2 NOT NULL,
Expiry DATETIME2 NOT NULL
)
create table ObservationPeriodObserved (
ObservationPeriodObservedId BIGINT IDENTITY(1,1) NOT NULL,
BusinessKey BIGINT NOT NULL,
PeriodTemp DECIMAL (11, 5) NOT NULL,
Description NVARCHAR(100) NOT NULL,
Effective DATETIME2 NOT NULL,
Expiry DATETIME2 NOT NULL
)
ObservationPeriod Data
ObservationPeriodId | BusinessKey | Effective | Expiry |
---|---|---|---|
1 | 24 | 2021-01-01 | 2021-12-31 |
GasPeriod Data
GasPeriodId | BusinessKey | PeriodTemp | Description | Effective | Expiry |
---|---|---|---|---|---|
1 | 24 | 101.328 | first g | 2020-09-30 | 2021-03-31 |
2 | 24 | 102.456 | second g | 2021-06-01 | 2021-07-31 |
3 | 24 | 100.011 | third g | 2021-09-01 | 9999-12-31 |
LiquidPeriod Data
LiquidPeriodId | BusinessKey | PeriodTemp | Description | Effective | Expiry |
---|---|---|---|---|---|
1 | 24 | 98.99 | first l | 2021-01-01 | 2021-04-30 |
2 | 24 | 98.76 | second l | 2021-06-01 | 2021-08-31 |
3 | 24 | 99.978 | third l | 2021-09-01 | 9999-12-31 |
SolidPeriod Data
SolidPeriodId | BusinessKey | PeriodTemp | Description | Effective | Expiry |
---|---|---|---|---|---|
1 | 24 | -0.145 | first s | 2021-01-01 | 2021-06-30 |
2 | 24 | -0.987 | second s | 2021-07-01 | 9999-12-31 |
ObvservationPeriodObserved Data
ObvservationPeriodObservedIdId | BusinessKey | PeriodTemp | Description | Effective | Expiry |
---|---|---|---|---|---|
1 | 24 | 101.328 | first g | 2021-01-01 | 2021-03-31 |
2 | 24 | 98.99 | first l | 2021-04-01 | 2021-04-30 |
3 | 24 | -0.145 | first s | 2021-05-01 | 2021-05-31 |
4 | 24 | 102.456 | second g | 2021-06-01 | 2021-07-31 |
5 | 24 | 98.76 | second l | 2021-08-01 | 2021-08-31 |
6 | 24 | 100.011 | third g | 2021-09-01 | 2021-12-31 |
The idea is that for a given row in ObservationPeriod, there are many associated periods of time between the three aforementioned grains, but only one should ever be recorded as a subset of an ObservationPeriod for a given period of time.
Please also assume that there must be separate granularity here and that this problem cannot be solved away by getting this data into the same table -- it cannot. I can't use the actual business model here so I'm trying to get as close as I can conceptually.
CodePudding user response:
The following approach starts by creating a union of the three datasets Gas
, Liquid
and Solid
. In this union an addition column is created PeriodPriority
which will assist in choosing the winning row. I've interpreted the winning row as a period entry occurring within the observational period, is the most recent and has not expired and will be chosen based on the rank of Gas
-1
, Liquid
-2
and Solid
-3
. This forms the basis of the DENSE_RANK
window function as it is ordered by recent expired date and the PeriodPriority
. Since this winning record may have a date exceeding the observational period, I used a case expression to ensure that the value inserted was within the observational period.
Although there is only one Observational period I have still included the where clause WHERE op.ObservationPeriodId=1
which you may update/remove as desired. I also joined on BusinessKey
as I am not sure whether this will change throughout your entire set. If BusinessKey
never changes, then this may be omitted from the join expression.
The resulting query currently looks like this
SELECT
ROW_NUMBER() OVER (ORDER BY c.Effective, c.Expiry ) as ObservationPeriodObservedId,
c.BusinessKey,
c.PeriodTemp,
c.Description,
c.Effective,
CASE
WHEN c.Expiry >= op.Expiry THEN op.Expiry
ELSE c.Expiry
END as Expiry
FROM (
SELECT
*,
DENSE_RANK() OVER (ORDER BY t.Expiry DESC, t.PeriodPriority ) rk
FROM (
SELECT
BusinessKey,PeriodTemp, Description,Effective,Expiry, 1 as PeriodPriority
FROM GasPeriod
UNION ALL
SELECT
BusinessKey,PeriodTemp, Description,Effective,Expiry, 2
FROM LiquidPeriod
UNION ALL
SELECT
BusinessKey,PeriodTemp, Description,Effective,Expiry, 3
FROM SolidPeriod
) t
) c
INNER JOIN ObservationPeriod op ON c.BusinessKey=op.BusinessKey AND
op.Effective <= c.Effective AND
(op.Expiry >= c.Expiry OR rk=1)
WHERE op.ObservationPeriodId=1
ORDER BY c.Effective, c.Expiry
and the insert statement
INSERT INTO ObservationPeriodObserved (BusinessKey, PeriodTemp, Description, Effective, Expiry)
SELECT
c.BusinessKey,
c.PeriodTemp,
c.Description,
c.Effective,
CASE
WHEN c.Expiry >= op.Expiry THEN op.Expiry
ELSE c.Expiry
END as Expiry
FROM (
SELECT
*,
DENSE_RANK() OVER (ORDER BY t.Expiry DESC, t.PeriodPriority ) rk
FROM (
SELECT
BusinessKey,PeriodTemp, Description,Effective,Expiry, 1 as PeriodPriority
FROM GasPeriod
UNION ALL
SELECT
BusinessKey,PeriodTemp, Description,Effective,Expiry, 2
FROM LiquidPeriod
UNION ALL
SELECT
BusinessKey,PeriodTemp, Description,Effective,Expiry, 3
FROM SolidPeriod
) t
) c
INNER JOIN ObservationPeriod op ON c.BusinessKey=op.BusinessKey AND
op.Effective <= c.Effective AND
(op.Expiry >= c.Expiry OR rk=1)
WHERE op.ObservationPeriodId=1
ORDER BY c.Effective, c.Expiry
generates the desired results.
Edit 1 - Consecutive Effective, Expiry Dates
As per the updated question and comment, I have modified the above to utilize LAG
along with DATE_ADD
to provide consecutive dates. The Insert query (latter part is a SELECT) is included below along with an updated db fiddle which provides the desired results. The exception here is record 1
of SolidPeriod
that had an end date of 2021-06-30
. When this date was changed to 2021-05-31
as in your desired results, the query corrected the 1 date that was not as expected. If there are additional considerations here or there was an error in the sample data, please let me know. I made the adjustment to the sample data instead of an adhoc calculation as I could not assume some logic to make a change as such (i.e. on an arbitrary record subtract 1 month). Let me know whether this works for you and advise further.
INSERT INTO ObservationPeriodObserved (BusinessKey, PeriodTemp, Description, Effective, Expiry)
SELECT
c.BusinessKey,
c.PeriodTemp,
c.Description,
CASE
WHEN LAG(c.Expiry) OVER (ORDER BY c.Effective, c.Expiry) IS NULL THEN op.Effective
ELSE DATEADD(DAY,1,LAG(c.Expiry) OVER (ORDER BY c.Effective, c.Expiry))
END as Effective,
CASE
WHEN c.Expiry >= op.Expiry THEN op.Expiry
ELSE c.Expiry
END as Expiry
FROM (
SELECT
*,
DENSE_RANK() OVER (ORDER BY t.Expiry DESC, t.PeriodPriority ) rk
FROM (
SELECT
BusinessKey,PeriodTemp, Description,Effective,Expiry, 1 as PeriodPriority
FROM GasPeriod
UNION ALL
SELECT
BusinessKey,PeriodTemp, Description,Effective,Expiry, 2
FROM LiquidPeriod
UNION ALL
SELECT
BusinessKey,PeriodTemp, Description,Effective,Expiry, 3
FROM SolidPeriod
) t
) c
INNER JOIN ObservationPeriod op ON c.BusinessKey=op.BusinessKey AND
op.Effective <= c.Effective AND
(op.Expiry >= c.Expiry OR rk=1)
WHERE op.ObservationPeriodId=1
ORDER BY c.Effective, c.Expiry
Let me know if this works for you.