I have a table like this:
CREATE TABLE Rates
(
RateGroup int NOT NULL,
Rate decimal(5, 2) NOT NULL,
DueDate date NOT NULL
);
This table contains rates which are valid from a certain due date to the day before the next due date. If no next due date is present, the rate's validity has no end. There can be multiple consecutive due days with the same rate and a certain can appear on different non consecutive due days as well.
The rates are divided into groups. A single due date can appear in multiple groups but only once per group.
Here's some example data:
INSERT INTO Rates(RateGroup, Rate, DueDate)
VALUES
(1, 1.2, '20210101'), (1, 1.2, '20210215'), (1, 1.5, '20210216'),
(1, 1.2, '20210501'), (2, 3.7, '20210101'), (2, 3.7, '20210215'),
(2, 3.7, '20210216'), (2, 3.7, '20210501'), (3, 2.9, '20210101'),
(3, 2.5, '20210215'), (3, 2.5, '20210216'), (3, 2.1, '20210501');
RateGroup | Rate | DueDate |
---|---|---|
1 | 1.20 | 2021-01-01 |
1 | 1.20 | 2021-02-15 |
1 | 1.50 | 2021-02-16 |
1 | 1.20 | 2021-05-01 |
2 | 3.70 | 2021-01-01 |
2 | 3.70 | 2021-02-15 |
2 | 3.70 | 2021-02-16 |
2 | 3.70 | 2021-05-01 |
3 | 2.90 | 2021-01-01 |
3 | 2.50 | 2021-02-15 |
3 | 2.50 | 2021-02-16 |
3 | 2.10 | 2021-05-01 |
Now I want a query which folds multiple consecutive rows of a rate group with the same rate to a single row containing the date range (start and end date) where the rate is valid.
This is the desired result:
RateGroup | Rate | StartDate | EndDate |
---|---|---|---|
1 | 1.20 | 2021-01-01 | 2021-02-15 |
1 | 1.50 | 2021-02-16 | 2021-04-30 |
1 | 1.20 | 2021-05-01 | NULL |
2 | 3.70 | 2021-01-01 | NULL |
3 | 2.90 | 2021-01-01 | 2021-02-14 |
3 | 2.50 | 2021-02-15 | 2021-04-30 |
3 | 2.10 | 2021-05-01 | NULL |
How can I achieve this?
CodePudding user response:
This can be done with Common Table Expressions
uilizing the OVER
Clause
as in the following query:
WITH
RatesWithBegin AS
(
SELECT RateGroup, Rate, DueDate,
CASE
WHEN Rate = LAG(Rate) OVER (PARTITION BY RateGroup ORDER BY DueDate)
THEN 0
ELSE 1
END AS IsBegin
FROM Rates
),
RatesFromTo AS
(
SELECT RateGroup, Rate, DueDate AS StartDate,
LEAD (DATEADD(day, -1, DueDate)) OVER
(
PARTITION BY RateGroup
ORDER BY DueDate
) AS EndDate,
SUM (IsBegin) OVER
(
PARTITION BY RateGroup
ORDER BY DueDate
ROWS UNBOUNDED PRECEDING
) AS RangeID
FROM RatesWithBegin
)
SELECT RateGroup, MAX(Rate) AS Rate, MIN(StartDate) AS StartDate,
NULLIF(MAX(COALESCE(EndDate, '99990101')), '99990101') AS EndDate
FROM RatesFromTo
GROUP BY RateGroup, RangeID
ORDER BY RateGroup, StartDate;
How does it work?
RatesWithBegin AS
(
SELECT RateGroup, Rate, DueDate,
CASE
WHEN Rate = LAG(Rate) OVER (PARTITION BY RateGroup ORDER BY DueDate)
THEN 0
ELSE 1
END AS IsBegin
FROM Rates
),
Here we are using LAG()
to compare the current rate to it's predecessor. PARTITION BY RateGroup
makes sure that we don't
mix rate groups and ORDER BY DueDate
determines the order at which we look at the rows.
If the current rate is equal to it's predecessor we mark the row with a 0 otherwise witha 1. The result of this CTE would look like so for the first rate group:
RateGroup | Rate | DueDate | IsBegin |
---|---|---|---|
1 | 1.20 | 2021-01-01 | 1 |
1 | 1.20 | 2021-02-15 | 0 |
1 | 1.50 | 2021-02-16 | 1 |
1 | 1.20 | 2021-05-01 | 1 |
0 and 1 are no arbitrary values; they are needed for the next step.
RatesFromTo AS
(
SELECT RateGroup, Rate, DueDate AS StartDate,
LEAD (DATEADD(day, -1, DueDate)) OVER
(
PARTITION BY RateGroup
ORDER BY DueDate
) AS EndDate,
SUM (IsBegin) OVER
(
PARTITION BY RateGroup
ORDER BY DueDate
ROWS UNBOUNDED PRECEDING
) AS RangeID
FROM RatesWithBegin
)
In this CTE we build a running total over the IsBegin
column with
SUM()
. Due to the value being 1 at the start
of a new range and 0 within a range our running total increments always at the begin of a new
range. This leads to an uniqe number for each range.
With LEAD()
we add
the day before the next due date in our range to the output. The result of this step for the first rate group is then:
RateGroup | Rate | StartDate | EndDate | RangeID |
---|---|---|---|---|
1 | 1.20 | 2021-01-01 | 2021-02-14 | 1 |
1 | 1.20 | 2021-02-15 | 2021-02-15 | 1 |
1 | 1.50 | 2021-02-16 | 2021-04-30 | 2 |
1 | 1.20 | 2021-05-01 | NULL | 3 |
SELECT RateGroup, MAX(Rate) AS Rate, MIN(StartDate) AS StartDate,
NULLIF(MAX(COALESCE(EndDate, '99990101')), '99990101') AS EndDate
FROM RatesFromTo
GROUP BY RateGroup, RangeID
ORDER BY RateGroup, StartDate;
Now that we have an unique identifier (RangeID
) for the date ranges we can do a simple
aggregation with GROUP BY
to get our desired result. Since the ranges can be open ended
(no next due date) we use
NULLIF(MAX(COALESCE(EndDate, '99990101')), '99990101')
to make sure that NULL
is always treated as the latest possible date.