I am having trouble with the following problem. Using t-sql, starting from the following sample data in the table:
valid_from | valid_to | article | price |
---|---|---|---|
2021-01-01 | 2021-01-02 | AAA | 10,99 |
2021-01-02 | 2021-01-03 | AAA | 10,99 |
2021-01-03 | 2021-01-05 | AAA | 10,99 |
2021-01-05 | 2021-01-06 | AAA | 12,99 |
2021-01-06 | 2021-01-09 | AAA | 12,99 |
2021-01-09 | 2021-01-14 | AAA | 10,99 |
2021-01-14 | 2021-01-15 | AAA | 10,99 |
2021-01-15 | 2021-01-16 | AAA | 10,99 |
I have to produce the "summarized" data table collapsing the data of the consecutive days when the price remained the same and producing just one line composed by the smallest valid_from and greatest valid_to. In this case the desired output would be:
valid_from | valid_to | article | price |
---|---|---|---|
2021-01-01 | 2021-01-05 | AAA | 10,99 |
2021-01-05 | 2021-01-09 | AAA | 12,99 |
2021-01-09 | 2021-01-16 | AAA | 10,99 |
I tried with some queries, but I can't find a way to solve completely the problem.
DECLARE @TBL_DATA AS TABLE (
valid_from DATE NOT NULL,
valid_to DATE NOT NULL,
article VARCHAR(20) NOT NULL,
price FLOAT NOT NULL
)
INSERT INTO @TBL_DATA (
valid_from,
valid_to,
article,
price
)
VALUES
('2021-01-01','2021-01-02','AAA',10.99),
('2021-01-02','2021-01-03','AAA',10.99),
('2021-01-03','2021-01-05','AAA',10.99),
('2021-01-05','2021-01-06','AAA',12.99),
('2021-01-06','2021-01-09','AAA',12.99),
('2021-01-09','2021-01-14','AAA',10.99),
('2021-01-14','2021-01-15','AAA',10.99),
('2021-01-15','2021-01-16','AAA',10.99)
SELECT
td.*,
DENSE_RANK() OVER (ORDER BY td.article ASC, td.price ASC) AS dr
FROM @TBL_DATA AS td
WHERE 1 = 1
ORDER BY
td.valid_from ASC
valid_from | valid_to | article | price | dr |
---|---|---|---|---|
2021-01-01 | 2021-01-02 | AAA | 10.99 | 1 |
2021-01-02 | 2021-01-03 | AAA | 10.99 | 1 |
2021-01-03 | 2021-01-05 | AAA | 10.99 | 1 |
2021-01-05 | 2021-01-06 | AAA | 12.99 | 2 |
2021-01-06 | 2021-01-09 | AAA | 12.99 | 2 |
2021-01-09 | 2021-01-14 | AAA | 10.99 | 1 |
2021-01-14 | 2021-01-15 | AAA | 10.99 | 1 |
2021-01-15 | 2021-01-16 | AAA | 10.99 | 1 |
As intermediate step I calculate the DENSE_RANK ordering by article and price, but the 3rd group (by price) is getting the same rank of the 1st. Can you please help me?
CodePudding user response:
you can use rank or row_number window functions to find islands & gaps
:
select min(valid_from) valid_from, max(valid_to) valid_to, article, price
from (
select *
, rank() over (partition by article order by valid_from)
- rank() over (partition by article,price order by valid_from) groups
from TBL_DATA) t
group by groups , price, article
order by min(valid_from)
db<>fiddle here