Home > Back-end >  group cumsum on pandas, reset cumsum to 0 when cumsum is negative
group cumsum on pandas, reset cumsum to 0 when cumsum is negative

Time:10-19

how to get "expected_output" column with cumsum?

groupa = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C']
valuesa = [-5, 4, 3, -4, -2, -2, 2, -2, 5, -2, 3, 6, 7, -11]
expected_output = [0, 4, 7, 3, 1, 0, 2, 0, 5, 3, 3, 9, 16, 5]
dfa = pd.DataFrame(list(zip(groupa, valuesa)), columns=['group', 'val'])
dfa['expected_output'] = pd.Series(expected_output)
dfa['cumsum_output'] = dfa.groupby('group')['val'].cumsum()
nega = dfa['val'] < 0
dfa['output'] = dfa['val'].groupby(
    [nega[::-1].cumsum(), dfa['group']]).cumsum().clip(0)
dfa
    group   val expected_output cumsum_output   output
0   A      -5   0              -5               0
1   A       4   4              -1               4
2   A       3   7               2               7
3   A      -4   3              -2               3
4   A      -2   1              -4               0
5   A      -2   0              -6               0
6   A       2   2              -4               2
7   B      -2   0              -2               0
8   B      5    5               3               5
9   B     -2    3               1               3
10  C     3     3               3               3
11  C     6     9               9               9
12  C     7     16              16              16
13  C     -11   5               5               5

Cumsum on Pandas DF with reset to zero for negative cumulative values

CodePudding user response:

An other solution is to use cummin :

groupa = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C']
valuesa = [-5, 4, 3, -4, -2, -2, 2, -2, 5, -2, 3, 6, 7, -11]
expected_output = [0, 4, 7, 3, 1, 0, 2, 0, 5, 3, 3, 9, 16, 5]
dfa = pd.DataFrame(list(zip(groupa, valuesa)), columns=['group', 'val'])

dfa['cumsum_output'] = dfa.groupby('group')['val'].cumsum()
dfa['cummin_output'] = dfa.groupby('group')['cumsum_output'].cummin()
dfa['cummin_output']=dfa['cummin_output'].apply(lambda x: x if x<0 else 0)

dfa['result']=dfa['cumsum_output']-dfa['cummin_output']


dfa
Out[1]: 
   group  val  cumsum_output  cummin_output  result
0      A   -5             -5             -5       0
1      A    4             -1             -5       4
2      A    3              2             -5       7
3      A   -4             -2             -5       3
4      A   -2             -4             -5       1
5      A   -2             -6             -6       0
6      A    2             -4             -6       2
7      B   -2             -2             -2       0
8      B    5              3             -2       5
9      B   -2              1             -2       3
10     C    3              3              0       3
11     C    6              9              0       9
12     C    7             16              0      16
13     C  -11              5              0       5

CodePudding user response:

Let's look at the cumsum for group A. The first entry of the series is negative. To bring it to 0, we need to boost it by 5. And since this is cumulative, all subsequent elements are also boosted by 5:

cumsum  boost  cumsum_after_boost
-5      5      0
-1      5      4
 2      5      7
-2      5      3
-4      5      1
-6      5      -1
-4      5      1

There is still a negative element in cumsum_after_boost. So let's boost it again:

cumsum_after_boost  second_boost   cumsum_after_second_boost
0                                  0
4                                  4
7                                  7
3                                  3
1                                  1
-1                  1              0
1                   1              2

Now no more cumsum is negative and we get the expected output. Let's combine the two boosts together:

cumsum  combined_boost  cumsum_after_combined_boost
-5      5               0
-1      5               4
 2      5               7
-2      5               3
-4      5               1
-6      6               0
-4      6               2

Notice anything? The combined boost is the negative of the rolling min of the cumsum. However, there's one final twist: if the cumsum is positive, we don't need to boost it. For example:

val   cumsum   cummin         output
               (i.e. boost)   (cumsum - cummin)
 1     1        0             0
-5    -4       -4             0
 4     0       -4             4
 2     2       -4             6
-4    -2       -4             2
-2     0       -4             4

Code:

g = dfa["group"]
cumsum = dfa["val"].groupby(g).cumsum()
cummin = cumsum.mask(cumsum > 0, 0).groupby(g).cummin()

dfa["output"] = cumsum - cummin
  • Related