Home > OS >  Calculating the cumulative sum with some conditions (gaps-and-islands problem)
Calculating the cumulative sum with some conditions (gaps-and-islands problem)

Time:02-05

Sorry if the title is a bit vague please suggest a title if you think it can articulate the problem. I'll start with what data I have and the end result I'm trying to get and then the TLDR:

This is the table I have:

row number id created amount event
1 1 2022-01-01 -200 spend
2 1 2022-01-02 1000 loan
3 1 2022-01-03 -200 spend
4 1 2022-01-04 -500 spend
5 1 2022-01-05 -500 loan
6 1 2022-01-06 100 spend
7 1 2022-01-07 -500 spend
8 1 2022-01-08 1000 loan
9 1 2022-01-09 -100 spend

I'm trying to make:

row number id created amount event cumulative_sum
1 1 2022-01-01 -200 spend -200
2 1 2022-01-02 1000 loan 1000
3 1 2022-01-03 -200 spend 800
4 1 2022-01-04 -500 spend 300
5 1 2022-01-05 -500 loan 300
6 1 2022-01-06 100 spend 300
7 1 2022-01-07 -500 spend -200
8 1 2022-01-08 1000 loan 1000
9 1 2022-01-09 -100 spend 900

Required logic:

I want to get a special cumulative sum which sums the amount only when: (the amount is < 0 AND the event is spend) OR (when amount is > 0 AND event is loan) .

The thing is I want the cumulative sum to start when that first positive loan amount. I don't care about anything before the positive loan amount and if they are counted it will obscure the results. The requirement is trying to select the rows which the loan enabled (if the loan is 1000 then we want to select the rows that add up to -1000 but only when event is spend and amount < 0).

my attempt

WITH tmp AS (
  SELECT
  1 AS id, 
  '2021-01-01' AS created,
  -200 AS amount,
  'spend' AS scheme

  UNION ALL

  SELECT
  1 AS id, 
  '2022-01-02' AS created,
  1000 AS amount,
  'loan' AS scheme

  UNION ALL

  SELECT
  1 AS id, 
  '2022-01-03' AS created,
  -200 AS amount,
  'spend' AS scheme
  
  UNION ALL

  SELECT
  1 AS id, 
  '2022-01-04' AS created,
  -500 AS amount,
  'spend' AS scheme
  
  UNION ALL

  SELECT
  1 AS id, 
  '2022-01-05' AS created,
  -500 AS amount,
  'loan' AS scheme
  
  UNION ALL

  SELECT
  1 AS id, 
  '2022-01-06' AS created,
  100 AS amount,
  'spend' AS scheme

  UNION ALL

  SELECT
  1 AS id, 
  '2022-01-07' AS created,
  -500 AS amount,
  'spend' AS scheme

  UNION ALL

  SELECT
  1 AS id, 
  '2022-01-08' AS created,
  1000 AS amount,
  'loan' AS scheme

  UNION ALL

  SELECT
  1 AS id, 
  '2022-01-09' AS created,
  -100 AS amount,
  'spend' AS scheme

)

SELECT 
*,
SUM(CASE WHEN (scheme != 'loan' AND amount<0) OR (scheme = 'loan' AND amount > 0) THEN amount ELSE 0 END)
    OVER (PARTITION BY id ORDER BY created ASC) AS cumulative_sum_spend
 FROM tmp

Question

How do I make the cumulative sum reset at row 2?

CodePudding user response:

One option would be something like this:

SELECT 
*,
SUM(CASE WHEN cnt >= 1 AND ((scheme != 'loan' AND amount<0) OR (scheme = 'loan' AND amount > 0)) THEN amount ELSE 0 END)
    OVER (PARTITION BY id ORDER BY created ASC) AS cumulative_sum_spend
FROM (
  SELECT *, SUM(CASE WHEN amount > 0 THEN 1 ELSE 0 END) OVER (PARTITION BY id ORDER BY created) cnt
  FROM tmp
) a

The idea here is that the inner query's window function counts the number of previous positive values. Then the outer query can do an extra check cnt >= 1 as part of its window function, so it will only consider values after the first positive one.

CodePudding user response:

That's a gaps-and-islands problem if I am understanding this correctly.

Islands start with a positive loan ; within each island, you want to compute a running sum in a subset of rows.

We can identify the islands in a subquery with a window count of positive loans, then do the maths in each group with a conditional expression:

select id, created, amount, event, 
    sum(case when (event = 'loan' and amount > 0) or (event = 'spend' and amount < 0) then amount end)
        over(partition by id, grp order by created) as cumulative_sum
from (
    select t.*, 
        sum(case when  event = 'loan' and amount > 0 then 1 else 0 end)
            over(partition by id order by created) grp
    from tmp t
) t
order by id, created
  • Related