Home > front end >  How to group data based on continuous date range?
How to group data based on continuous date range?

Time:05-02

I am trying to group the below sample data based on date and price.

PRODUCT_CODE PRICING_DATE PRICE
1234 01-01-2022 23.9
1234 02-01-2022 23.9
1234 03-01-2022 23.9
1234 04-01-2022 22.9
1234 05-01-2022 22.9
1234 06-01-2022 24.9
1234 07-01-2022 24.9
1234 08-01-2022 23.9
1234 09-01-2022 23.9

Desired Output:

PRODUCT_CODE MIN_DATE MAX_DATE PRICE
1234 01-01-2022 03-01-2022 23.9
1234 04-01-2022 05-01-2022 22.9
1234 06-01-2022 07-01-2022 24.9
1234 08-01-2022 09-01-2022 23.9

I have tried with this query but it doesn't give the right output.

SELECT Product_Code,
    min(Pricing_Date) AS Min_Date,
    max(Pricing_Date) AS Max_Date, 
    price 
FROm PRICE_DATA
GROUP BY
    Product_Code,
    Price

Which outputs

PRODUCT_CODE MIN_DATE MAX_DATE PRICE
1234 01-01-2022 09-01-2022 23.9
1234 04-01-2022 05-01-2022 22.9
1234 06-01-2022 07-01-2022 24.9

So the date range for price is 23.9 is not right because price not same for all the days in that range. I am not sure how to partition the data as desired output.

Basically I don't want any overlapping date ranges.

Database: SQL Server 13

CodePudding user response:

the date range for price is 23.9 is not right because price not same for all the days in that range.

Because there are two same price in different overlapping date ranges, so you might get only one row when you used aggregate function.

This is a gap-and-island problem, we can try to use ROW_NUMBER window function to get the gap of overlapping date and then group by that.

SELECT  Product_Code,
        min(Pricing_Date) AS Min_Date ,
        max(Pricing_Date) AS Max_Date, 
        price
FROM (
    SELECT *,
           ROW_NUMBER() OVER(ORDER BY PRICING_DATE) - ROW_NUMBER() OVER(PARTITION BY  PRODUCT_CODE,PRICE ORDER  BY PRICING_DATE) grp
    FROM PRICE_DATA 
) t1
GROUP BY grp,Product_Code,price
ORDER BY  min(Pricing_Date)

sqlfiddle

Explain

The gap-and-island problem is a feature

continuous(overlapping) data is that a set (continuous range of sequence) - (values ​​based on a certain order of conditions sequence) yields the same grouping.

so that We can use

  • ROW_NUMBER() OVER(ORDER BY PRICING_DATE) making a continuous range of values.
  • ROW_NUMBER() OVER(PARTITION BY PRODUCT_CODE,PRICE ORDER BY PRICING_DATE) making values ​​based on a certain order of conditions.

Then we will get a grouping column with overlapping data as sqlfiddle

  • Related