I have data in this format.
[
(Seller, price, date) <- headers
(A, 10, 2021-01-01)
(A, 20, 2021-01-02)
(A, 20, 2021-01-03)
(A, 30, 2021-01-04)
(B, 20, 2021-01-05)
(B, 40, 2021-01-06)
(C, 20, 2021-01-07)
(A, 20, 2021-01-08)
(A, 20, 2021-01-09)
(A, 20, 2021-01-10)
(A, 10, 2021-01-11)
(D, 10, 2021-01-12)
(D, 20, 2021-01-13)
(D, 10, 2021-01-14)
(C, 40, 2021-01-15)
(C, 40, 2021-01-16)
(A, 10, 2021-01-17)
]
I want to create a query which can give me following result.
[
(Seller, Avg_Price, From_Date, To_Date) <- headers
(A, (10 20 20 30) / 4, 2021-01-01, 2021-01-04)
(B, (20 40) / 4, 2021-01-05, 2021-01-06)
(C, (20) / 1, 2021-01-07, 2021-01-07)
(A, (20 20 20 10) / 4, 2021-01-08, 2021-01-11)
(D, (10 20 10) / 3, 2021-01-12, 2021-01-14)
(C, (40 40) / 2, 2021-01-15, 2021-01-16)
(A, (10) / 1, 2021-01-17, 2021-01-17)
]
I am struggling to manage the group of data as same retailer can come again again but I want to consider it different if chain is broken.
Let me know if you need more details. Thanks in Advance.
CodePudding user response:
Use window functions - first compute flag whose value is 1 on every seller change, then compute group numbers as sum of those flags from the beginning so the first group has value 1, second group 2 etc..., then group by group number and compute what you need.
select min(seller) as Seller
, avg(price) as Avg_Price
, min(date) as From_Date
, max(date) as To_Date
from (
select seller, price, date, sum(seller_chaned) over (order by date) as grp
from (
select seller, price, date
, case when lag(seller) over (order by date) != seller then 1 else 0 end as seller_changed
from t
)
)
group by grp
order by grp
(Note I composed query just in my head so there might be typos or off-by-one errors but I hope the idea is obvious. It is better to provide dbfiddle with sample data.)