I need to calculate cumulative calculations (sum,mean,median,etc) based on the values of Indicator
It should do the calculations for all the false indicator and print it adjacent to the true. then calculated value should reset and start from true until true appears again for Indicator.
Input data frame
Amount | Indicator |
---|---|
10 | False |
20 | False |
5 | True |
8 | False |
4 | False |
6 | True |
output would be
Amount | Indicator | Sum | Mean |
---|---|---|---|
10 | False | ||
20 | False | ||
5 | True | 30 | 15 |
8 | False | ||
4 | False | ||
6 | True | 17 | 5.6 |
I tried to use groupby with cumsum() using Indicator as group , but have not had any luck yet.
CodePudding user response:
In one go using only one groupby:
df[["Sum", "Mean"]] = np.NaN
df.loc[df.Indicator, ["Sum", "Mean"]] = (df.groupby(df.Indicator.cumsum())
.agg({"Amount": ["sum", "mean"]})
.shift(1).iloc[1:].values)
CodePudding user response:
Let's use cumsum
of Indicator
column as group key to get cumsum and size of Amount
column
m = df['Indicator'].cumsum()
df['Sum'] = df.groupby(m)['Amount'].cumsum().shift().where(df['Indicator'])
df['Mean'] = df['Sum'].shift(-1).div(df.groupby(m)['Amount'].transform('size')).shift()
print(df)
Amount Indicator Sum Mean
0 10 False NaN NaN
1 20 False NaN NaN
2 5 True 30.0 15.000000
3 8 False NaN NaN
4 4 False NaN NaN
5 6 True 17.0 5.666667