Home > other >  Conditional cumsum of pandas column
Conditional cumsum of pandas column

Time:09-02

I have the following dataframe:

indicator = ["buy"]   ["hold"]*3   ["sell"]   ["hold"]*4   ["buy"]   ["hold"] * 2
values = np.random.randn(len(indicator)) / 100
df = pd.DataFrame({"indicator": indicator, "values": values})
df

OUTPUT
    indicator   values
0   buy      0.001810
1   hold     0.011779
2   hold    -0.003350
3   hold    0.010311
4   sell    -0.010846
5   hold    -0.013635
6   hold    0.003794
7   hold    -0.003792
8   hold    0.006421
9   buy    -0.019779
10  hold    0.007123
11  hold    0.025983

I want to cumulate column values based on the following logic of column indicator:

  • when buy cumulate for all rows until sell
  • when sell cumulate for all rows until buy

Also, I want to length of the buy or sell periods.

The expected output is something like this:

    indicator   values  period  Cumsum
0   buy     0.004730    1   0.004730
1   hold    -0.006814   1   -0.002084
2   hold    0.002424    1   0.000340
3   hold    -0.017007   1   -0.016667
4   sell    0.007531    2   0.007531
5   hold    -0.015347   2   -0.007816
6   hold    0.000051    2   -0.007765
7   hold    -0.001202   2   -0.008967
8   hold    -0.008070   2   -0.017037
9   buy     0.028718    3   0.028718
10  hold    -0.005978   3   0.022740
11  hold    0.004725    3   0.027465

How can I do this conditional cumulation. Once I have the column period I can do .groupby("period"). But what is a pandas way to generate this column?

CodePudding user response:

You can first cumsum on the indicator not being hold:

df['period'] = df['indicator'].ne('hold').cumsum()
df['Cumsum'] = df.groupby('period')['values'].cumsum()

And you can get the sizes with:

df.groupby('period').size()

Output:

   indicator    values  period    Cumsum
0        buy  0.001810       1  0.001810
1       hold  0.011779       1  0.013589
2       hold -0.003350       1  0.010239
3       hold  0.010311       1  0.020550
4       sell -0.010846       2 -0.010846
5       hold -0.013635       2 -0.024481
6       hold  0.003794       2 -0.020687
7       hold -0.003792       2 -0.024479
8       hold  0.006421       2 -0.018058
9        buy -0.019779       3 -0.019779
10      hold  0.007123       3 -0.012656
11      hold  0.025983       3  0.013327

CodePudding user response:

To produce the column period you can do this:

df['indicator'].isin(['buy', 'sell']).cumsum()

produces:

0     1
1     1
2     1
3     1
4     2
5     2
6     2
7     2
8     2
9     3
10    3
11    3
Name: indicator, dtype: int64

CodePudding user response:

df['period']=df['indicator'].isin(['buy','sell']).cumsum()
df['cumsum']=df.groupby('period')['values'].cumsum()
df

indicator   values  period  cumsum
0   buy     -0.004842   1   -0.004842
1   hold    0.006564    1   0.001722
2   hold    0.009956    1   0.011678
3   hold    0.005634    1   0.017311
4   sell    0.000781    2   0.000781
5   hold    0.002300    2   0.003081
6   hold    0.003439    2   0.006520
7   hold    0.005631    2   0.012151
8   hold    -0.009300   2   0.002851
9   buy     -0.006020   3   -0.006020
10  hold    -0.008950   3   -0.014970
11  hold    0.007020    3   -0.007950

data used

    indicator   values
0   buy     -0.004842
1   hold    0.006564
2   hold    0.009956
3   hold    0.005634
4   sell    0.000781
5   hold    0.002300
6   hold    0.003439
7   hold    0.005631
8   hold    -0.009300
9   buy     -0.006020
10  hold    -0.008950
11  hold    0.007020
  • Related