Select rows where column value has changed With agreegation-CodePudding

I have data in this format.

[

(Seller, price, date) <- headers

(A, 10, 2021-01-01)

(A, 20, 2021-01-02)

(A, 20, 2021-01-03)

(A, 30, 2021-01-04)

(B, 20, 2021-01-05)

(B, 40, 2021-01-06)

(C, 20, 2021-01-07)

(A, 20, 2021-01-08)

(A, 20, 2021-01-09)

(A, 20, 2021-01-10)

(A, 10, 2021-01-11)

(D, 10, 2021-01-12)

(D, 20, 2021-01-13)

(D, 10, 2021-01-14)

(C, 40, 2021-01-15)

(C, 40, 2021-01-16)

(A, 10, 2021-01-17)

]

I want to create a query which can give me following result.

[

(Seller, Avg_Price, From_Date, To_Date) <- headers

(A, (10 20 20 30) / 4, 2021-01-01, 2021-01-04)

(B, (20 40) / 4, 2021-01-05, 2021-01-06)

(C, (20) / 1, 2021-01-07, 2021-01-07)

(A, (20 20 20 10) / 4, 2021-01-08, 2021-01-11)

(D, (10 20 10) / 3, 2021-01-12, 2021-01-14)

(C, (40 40) / 2, 2021-01-15, 2021-01-16)

(A, (10) / 1, 2021-01-17, 2021-01-17)

]

I am struggling to manage the group of data as same retailer can come again again but I want to consider it different if chain is broken.

Let me know if you need more details. Thanks in Advance.

CodePudding user response：

Use window functions - first compute flag whose value is 1 on every seller change, then compute group numbers as sum of those flags from the beginning so the first group has value 1, second group 2 etc..., then group by group number and compute what you need.

select min(seller) as Seller
     , avg(price) as Avg_Price
     , min(date) as From_Date
     , max(date) as To_Date
from (
  select seller, price, date, sum(seller_chaned) over (order by date) as grp
  from (
    select seller, price, date
         , case when lag(seller) over (order by date) != seller then 1 else 0 end as seller_changed
    from t
  )
)
group by grp
order by grp

(Note I composed query just in my head so there might be typos or off-by-one errors but I hope the idea is obvious. It is better to provide dbfiddle with sample data.)