Input
month seller product amount
2021-10-01 A corn 20€
2021-10-02 A corn 40€
2021-10-02 B grain 30€
2021-10-03 B grain 10€
2021-10-03 A corn null
.....
I would like to count the delta (difference) between each month and the previous month's agricultural product purchase, and label them with the following events:
(if cost purchase this month > cost purchase last month --> increase
if cost purchase this month < cost purchase last month --> decrease
if cost purchase this month = x and cost purchase last month is null or 0 --> new
if cost purchase this month is null or 0 and cost purchase last month is not null --> stop
)
Expected output:
month seller product amount last_month_amount delta event
2021-10-01 A corn 20€ null 20€ new
2021-10-02 A corn 40€ 20€ 20€ increase
2021-10-02 B grain 30€ null 30€ new
2021-10-03 B grain 10€ 30€ -20€ decrease
2021-10-03 A corn null 40€ -40€ stop
If there is just one product, I can do:
select month
, seller
, product
, amount
, lag(amount) over (partition by seller,product order by month) as last_month_amount
, amount - last_month_amount as delta
, case when delta >0 and min(month) over (partition by seller) = month then 'new'
when delta >0 then 'increase'
when delta <0 then 'decrease'
when (delta is null or delta = 0) then 'stop'
end as event
However, having multiple agricultural products on the same month screw the logic. How can I adjust the logic for just one product for multiple products as these?
I think if I try to get the last_month_amount for corn, it returns the last month amount for grain instead. I can use a "case when", but it doesn't work if there is a lot of products.
CodePudding user response:
You are pretty close. A few small issues may be all that's holding you up. For one, you can't reference the alias last_month_amount
within the same query where you assign it. My preference is to pull the first part into a CTE and then do the calculations in the main query.
Secondly, you may also be seeing unusual behavior due to not handling nulls consistently. Since lag
may return null if you are looking at the first row in the partition, delta
should address this possibility. It appears you're aware delta
may be null (from the logic in your event
case statement), but delta
is currently coming up null for the first row, not the last, so I think you have the 'stop'
case backwards.
You also are missing a FROM
; perhaps you were abbreviating or simplifying your code.
Finally, the way "month" is used in this post is confusing, in part because it's an odd name for a DATE
column. When you say "the same month", it's unclear whether you mean the same date or the same month of the year. If you potentially have distinct rows with the same month (date), seller, and product, then you will need a secondary column in your order by
or you will get unpredictable ordering. Without knowing any of this I'm going to assume you've thought this through and that order by month
is good enough.
It seems like you may also want event to be 'new' if last month's amount was zero, since the previous row would be 'stop'. I've made that change, but adjust accordingly if my assumption is wrong.
Here's the same code with those changes:
with rows_in_context AS (
select month
, seller
, product
, amount
, lag(amount) over (partition by seller,product order by month) as last_month_amount
from some_table
)
select *
, coalesce(amount,0) - coalesce(last_month_amount,0) as delta
, case when COALESCE(last_month_amount,0) = 0 then 'new'
when COALESCE(amount,0) = 0 then 'stop'
when delta > 0 then 'increase'
when delta < 0 then 'decrease'
end as event
from rows_in_context
order by month, seller, product;