Home > Net >  Using lag and lead function for multiple partitions
Using lag and lead function for multiple partitions

Time:06-23

Input

month       seller product amount
2021-10-01    A     corn    20€
2021-10-02    A     corn    40€
2021-10-02    B     grain   30€  
2021-10-03    B     grain   10€     
2021-10-03    A     corn    null    
      .....

I would like to count the delta (difference) between each month and the previous month's agricultural product purchase, and label them with the following events:

(if cost purchase this month > cost purchase last month --> increase 
 if cost purchase this month < cost purchase last month --> decrease
 if cost purchase this month = x and cost purchase last month is null or 0 --> new
 if cost purchase this month is null or 0 and cost purchase last month is not null --> stop  
)

Expected output:

month       seller product amount  last_month_amount  delta   event
2021-10-01    A     corn    20€           null         20€     new
2021-10-02    A     corn    40€           20€          20€   increase
2021-10-02    B     grain   30€           null         30€     new
2021-10-03    B     grain   10€           30€         -20€   decrease
2021-10-03    A     corn    null          40€         -40€    stop     

If there is just one product, I can do:

select month 
  , seller
  , product
  , amount
  , lag(amount) over (partition by seller,product order by month) as last_month_amount
  , amount - last_month_amount as delta 
  , case when delta >0 and min(month) over (partition by seller) = month then 'new' 
         when delta >0 then 'increase' 
         when delta <0 then 'decrease'
         when (delta is null or delta = 0) then 'stop'
   end as event 

However, having multiple agricultural products on the same month screw the logic. How can I adjust the logic for just one product for multiple products as these?

I think if I try to get the last_month_amount for corn, it returns the last month amount for grain instead. I can use a "case when", but it doesn't work if there is a lot of products.

CodePudding user response:

You are pretty close. A few small issues may be all that's holding you up. For one, you can't reference the alias last_month_amount within the same query where you assign it. My preference is to pull the first part into a CTE and then do the calculations in the main query.

Secondly, you may also be seeing unusual behavior due to not handling nulls consistently. Since lag may return null if you are looking at the first row in the partition, delta should address this possibility. It appears you're aware delta may be null (from the logic in your event case statement), but delta is currently coming up null for the first row, not the last, so I think you have the 'stop' case backwards.

You also are missing a FROM; perhaps you were abbreviating or simplifying your code.

Finally, the way "month" is used in this post is confusing, in part because it's an odd name for a DATE column. When you say "the same month", it's unclear whether you mean the same date or the same month of the year. If you potentially have distinct rows with the same month (date), seller, and product, then you will need a secondary column in your order by or you will get unpredictable ordering. Without knowing any of this I'm going to assume you've thought this through and that order by month is good enough.

It seems like you may also want event to be 'new' if last month's amount was zero, since the previous row would be 'stop'. I've made that change, but adjust accordingly if my assumption is wrong.

Here's the same code with those changes:

with rows_in_context AS (
  select month 
  , seller
  , product
  , amount
  , lag(amount) over (partition by seller,product order by month) as last_month_amount
  from some_table
)
select *
  , coalesce(amount,0) - coalesce(last_month_amount,0) as delta 
  , case when COALESCE(last_month_amount,0) = 0 then 'new' 
         when COALESCE(amount,0) = 0 then 'stop'
         when delta > 0 then 'increase' 
         when delta < 0 then 'decrease'
    end as event 
from rows_in_context
order by month, seller, product;
  • Related