Sorry if the title is a bit vague please suggest a title if you think it can articulate the problem. I'll start with what data I have and the end result I'm trying to get and then the TLDR:
This is the table I have:
row number | id | created | amount | event |
---|---|---|---|---|
1 | 1 | 2022-01-01 | -200 | spend |
2 | 1 | 2022-01-02 | 1000 | loan |
3 | 1 | 2022-01-03 | -200 | spend |
4 | 1 | 2022-01-04 | -500 | spend |
5 | 1 | 2022-01-05 | -500 | loan |
6 | 1 | 2022-01-06 | 100 | spend |
7 | 1 | 2022-01-07 | -500 | spend |
8 | 1 | 2022-01-08 | 1000 | loan |
9 | 1 | 2022-01-09 | -100 | spend |
I'm trying to make:
row number | id | created | amount | event | cumulative_sum |
---|---|---|---|---|---|
1 | 1 | 2022-01-01 | -200 | spend | -200 |
2 | 1 | 2022-01-02 | 1000 | loan | 1000 |
3 | 1 | 2022-01-03 | -200 | spend | 800 |
4 | 1 | 2022-01-04 | -500 | spend | 300 |
5 | 1 | 2022-01-05 | -500 | loan | 300 |
6 | 1 | 2022-01-06 | 100 | spend | 300 |
7 | 1 | 2022-01-07 | -500 | spend | -200 |
8 | 1 | 2022-01-08 | 1000 | loan | 1000 |
9 | 1 | 2022-01-09 | -100 | spend | 900 |
Required logic:
I want to get a special cumulative sum which sums the amount only when:
(the amount is < 0 AND the event is spend) OR (when amount is > 0 AND event is loan)
.
The thing is I want the cumulative sum to start when that first positive loan amount. I don't care about anything before the positive loan amount and if they are counted it will obscure the results. The requirement is trying to select the rows which the loan enabled (if the loan is 1000 then we want to select the rows that add up to -1000 but only when event is spend and amount < 0).
my attempt
WITH tmp AS (
SELECT
1 AS id,
'2021-01-01' AS created,
-200 AS amount,
'spend' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-02' AS created,
1000 AS amount,
'loan' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-03' AS created,
-200 AS amount,
'spend' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-04' AS created,
-500 AS amount,
'spend' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-05' AS created,
-500 AS amount,
'loan' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-06' AS created,
100 AS amount,
'spend' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-07' AS created,
-500 AS amount,
'spend' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-08' AS created,
1000 AS amount,
'loan' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-09' AS created,
-100 AS amount,
'spend' AS scheme
)
SELECT
*,
SUM(CASE WHEN (scheme != 'loan' AND amount<0) OR (scheme = 'loan' AND amount > 0) THEN amount ELSE 0 END)
OVER (PARTITION BY id ORDER BY created ASC) AS cumulative_sum_spend
FROM tmp
Question
How do I make the cumulative sum reset at row 2?
CodePudding user response:
One option would be something like this:
SELECT
*,
SUM(CASE WHEN cnt >= 1 AND ((scheme != 'loan' AND amount<0) OR (scheme = 'loan' AND amount > 0)) THEN amount ELSE 0 END)
OVER (PARTITION BY id ORDER BY created ASC) AS cumulative_sum_spend
FROM (
SELECT *, SUM(CASE WHEN amount > 0 THEN 1 ELSE 0 END) OVER (PARTITION BY id ORDER BY created) cnt
FROM tmp
) a
The idea here is that the inner query's window function counts the number of previous positive values. Then the outer query can do an extra check cnt >= 1
as part of its window function, so it will only consider values after the first positive one.
CodePudding user response:
That's a gaps-and-islands problem if I am understanding this correctly.
Islands start with a positive loan ; within each island, you want to compute a running sum in a subset of rows.
We can identify the islands in a subquery with a window count of positive loans, then do the maths in each group with a conditional expression:
select id, created, amount, event,
sum(case when (event = 'loan' and amount > 0) or (event = 'spend' and amount < 0) then amount end)
over(partition by id, grp order by created) as cumulative_sum
from (
select t.*,
sum(case when event = 'loan' and amount > 0 then 1 else 0 end)
over(partition by id order by created) grp
from tmp t
) t
order by id, created