how to get "expected_output" column with cumsum?
groupa = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C']
valuesa = [-5, 4, 3, -4, -2, -2, 2, -2, 5, -2, 3, 6, 7, -11]
expected_output = [0, 4, 7, 3, 1, 0, 2, 0, 5, 3, 3, 9, 16, 5]
dfa = pd.DataFrame(list(zip(groupa, valuesa)), columns=['group', 'val'])
dfa['expected_output'] = pd.Series(expected_output)
dfa['cumsum_output'] = dfa.groupby('group')['val'].cumsum()
nega = dfa['val'] < 0
dfa['output'] = dfa['val'].groupby(
[nega[::-1].cumsum(), dfa['group']]).cumsum().clip(0)
dfa
group val expected_output cumsum_output output
0 A -5 0 -5 0
1 A 4 4 -1 4
2 A 3 7 2 7
3 A -4 3 -2 3
4 A -2 1 -4 0
5 A -2 0 -6 0
6 A 2 2 -4 2
7 B -2 0 -2 0
8 B 5 5 3 5
9 B -2 3 1 3
10 C 3 3 3 3
11 C 6 9 9 9
12 C 7 16 16 16
13 C -11 5 5 5
Cumsum on Pandas DF with reset to zero for negative cumulative values
CodePudding user response:
An other solution is to use cummin :
groupa = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C']
valuesa = [-5, 4, 3, -4, -2, -2, 2, -2, 5, -2, 3, 6, 7, -11]
expected_output = [0, 4, 7, 3, 1, 0, 2, 0, 5, 3, 3, 9, 16, 5]
dfa = pd.DataFrame(list(zip(groupa, valuesa)), columns=['group', 'val'])
dfa['cumsum_output'] = dfa.groupby('group')['val'].cumsum()
dfa['cummin_output'] = dfa.groupby('group')['cumsum_output'].cummin()
dfa['cummin_output']=dfa['cummin_output'].apply(lambda x: x if x<0 else 0)
dfa['result']=dfa['cumsum_output']-dfa['cummin_output']
dfa
Out[1]:
group val cumsum_output cummin_output result
0 A -5 -5 -5 0
1 A 4 -1 -5 4
2 A 3 2 -5 7
3 A -4 -2 -5 3
4 A -2 -4 -5 1
5 A -2 -6 -6 0
6 A 2 -4 -6 2
7 B -2 -2 -2 0
8 B 5 3 -2 5
9 B -2 1 -2 3
10 C 3 3 0 3
11 C 6 9 0 9
12 C 7 16 0 16
13 C -11 5 0 5
CodePudding user response:
Let's look at the cumsum for group A. The first entry of the series is negative. To bring it to 0, we need to boost it by 5. And since this is cumulative, all subsequent elements are also boosted by 5:
cumsum boost cumsum_after_boost
-5 5 0
-1 5 4
2 5 7
-2 5 3
-4 5 1
-6 5 -1
-4 5 1
There is still a negative element in cumsum_after_boost
. So let's boost it again:
cumsum_after_boost second_boost cumsum_after_second_boost
0 0
4 4
7 7
3 3
1 1
-1 1 0
1 1 2
Now no more cumsum is negative and we get the expected output. Let's combine the two boosts together:
cumsum combined_boost cumsum_after_combined_boost
-5 5 0
-1 5 4
2 5 7
-2 5 3
-4 5 1
-6 6 0
-4 6 2
Notice anything? The combined boost is the negative of the rolling min of the cumsum. However, there's one final twist: if the cumsum is positive, we don't need to boost it. For example:
val cumsum cummin output
(i.e. boost) (cumsum - cummin)
1 1 0 0
-5 -4 -4 0
4 0 -4 4
2 2 -4 6
-4 -2 -4 2
-2 0 -4 4
Code:
g = dfa["group"]
cumsum = dfa["val"].groupby(g).cumsum()
cummin = cumsum.mask(cumsum > 0, 0).groupby(g).cummin()
dfa["output"] = cumsum - cummin