I have the following dataframe:
indicator = ["buy"] ["hold"]*3 ["sell"] ["hold"]*4 ["buy"] ["hold"] * 2
values = np.random.randn(len(indicator)) / 100
df = pd.DataFrame({"indicator": indicator, "values": values})
df
OUTPUT
indicator values
0 buy 0.001810
1 hold 0.011779
2 hold -0.003350
3 hold 0.010311
4 sell -0.010846
5 hold -0.013635
6 hold 0.003794
7 hold -0.003792
8 hold 0.006421
9 buy -0.019779
10 hold 0.007123
11 hold 0.025983
I want to cumulate column values
based on the following logic of column indicator
:
- when
buy
cumulate for all rows untilsell
- when
sell
cumulate for all rows untilbuy
Also, I want to length of the buy
or sell
periods.
The expected output is something like this:
indicator values period Cumsum
0 buy 0.004730 1 0.004730
1 hold -0.006814 1 -0.002084
2 hold 0.002424 1 0.000340
3 hold -0.017007 1 -0.016667
4 sell 0.007531 2 0.007531
5 hold -0.015347 2 -0.007816
6 hold 0.000051 2 -0.007765
7 hold -0.001202 2 -0.008967
8 hold -0.008070 2 -0.017037
9 buy 0.028718 3 0.028718
10 hold -0.005978 3 0.022740
11 hold 0.004725 3 0.027465
How can I do this conditional cumulation. Once I have the column period
I can do .groupby("period")
. But what is a pandas way to generate this column?
CodePudding user response:
You can first cumsum on the indicator not being hold
:
df['period'] = df['indicator'].ne('hold').cumsum()
df['Cumsum'] = df.groupby('period')['values'].cumsum()
And you can get the sizes with:
df.groupby('period').size()
Output:
indicator values period Cumsum
0 buy 0.001810 1 0.001810
1 hold 0.011779 1 0.013589
2 hold -0.003350 1 0.010239
3 hold 0.010311 1 0.020550
4 sell -0.010846 2 -0.010846
5 hold -0.013635 2 -0.024481
6 hold 0.003794 2 -0.020687
7 hold -0.003792 2 -0.024479
8 hold 0.006421 2 -0.018058
9 buy -0.019779 3 -0.019779
10 hold 0.007123 3 -0.012656
11 hold 0.025983 3 0.013327
CodePudding user response:
To produce the column period
you can do this:
df['indicator'].isin(['buy', 'sell']).cumsum()
produces:
0 1
1 1
2 1
3 1
4 2
5 2
6 2
7 2
8 2
9 3
10 3
11 3
Name: indicator, dtype: int64
CodePudding user response:
df['period']=df['indicator'].isin(['buy','sell']).cumsum()
df['cumsum']=df.groupby('period')['values'].cumsum()
df
indicator values period cumsum
0 buy -0.004842 1 -0.004842
1 hold 0.006564 1 0.001722
2 hold 0.009956 1 0.011678
3 hold 0.005634 1 0.017311
4 sell 0.000781 2 0.000781
5 hold 0.002300 2 0.003081
6 hold 0.003439 2 0.006520
7 hold 0.005631 2 0.012151
8 hold -0.009300 2 0.002851
9 buy -0.006020 3 -0.006020
10 hold -0.008950 3 -0.014970
11 hold 0.007020 3 -0.007950
data used
indicator values
0 buy -0.004842
1 hold 0.006564
2 hold 0.009956
3 hold 0.005634
4 sell 0.000781
5 hold 0.002300
6 hold 0.003439
7 hold 0.005631
8 hold -0.009300
9 buy -0.006020
10 hold -0.008950
11 hold 0.007020