I am trying to group the below sample data based on date and price.
PRODUCT_CODE | PRICING_DATE | PRICE |
---|---|---|
1234 | 01-01-2022 | 23.9 |
1234 | 02-01-2022 | 23.9 |
1234 | 03-01-2022 | 23.9 |
1234 | 04-01-2022 | 22.9 |
1234 | 05-01-2022 | 22.9 |
1234 | 06-01-2022 | 24.9 |
1234 | 07-01-2022 | 24.9 |
1234 | 08-01-2022 | 23.9 |
1234 | 09-01-2022 | 23.9 |
Desired Output
PRODUCT_CODE | MIN_DATE | MAX_DATE | PRICE |
---|---|---|---|
1234 | 01-01-2022 | 03-01-2022 | 23.9 |
1234 | 04-01-2022 | 05-01-2022 | 22.9 |
1234 | 06-01-2022 | 07-01-2022 | 24.9 |
1234 | 08-01-2022 | 09-01-2022 | 23.9 |
I have tried with this query but it doesn't give the right output.
SELECT Product_Code,
min(Pricing_Date) AS Min_Date ,
max(Pricing_Date) AS Max_Date,
price
FROm PRICE_DATA
GROUP BY
Product_Code ,
Price
Which outputs
PRODUCT_CODE | MIN_DATE | MAX_DATE | PRICE |
---|---|---|---|
1234 | 01-01-2022 | 09-01-2022 | 23.9 |
1234 | 04-01-2022 | 05-01-2022 | 22.9 |
1234 | 06-01-2022 | 07-01-2022 | 24.9 |
So the date range for price is 23.9 is not right because price not same for all the days in that range. I am not sure how to partition the data as desired output.
Basically I don't want any overlapping date ranges.
Database : SQL Server 13
CodePudding user response:
the date range for price is 23.9 is not right because price not same for all the days in that range.
Because there are two same price
in different overlapping date ranges, so you might get only one row when you used aggregate function.
This is a gap-and-island problem, we can try to use ROW_NUMBER
window function to get the gap of overlapping date and then group by
that.
SELECT Product_Code,
min(Pricing_Date) AS Min_Date ,
max(Pricing_Date) AS Max_Date,
price
FROM (
SELECT *,
ROW_NUMBER() OVER(ORDER BY PRICING_DATE) - ROW_NUMBER() OVER(PARTITION BY PRODUCT_CODE,PRICE ORDER BY PRICING_DATE) grp
FROM PRICE_DATA
) t1
GROUP BY grp,Product_Code,price
ORDER BY min(Pricing_Date)
Explain
The gap-and-island problem is a feature
continuous(overlapping) data is that a set
(continuous range of sequence) - (values based on a certain order of conditions sequence)
yields the same grouping.
so that We can use
ROW_NUMBER() OVER(ORDER BY PRICING_DATE)
making a continuous range of values.ROW_NUMBER() OVER(PARTITION BY PRODUCT_CODE,PRICE ORDER BY PRICING_DATE)
making values based on a certain order of conditions.
Then we will get a grouping column with overlapping data as sqlfiddle