I try to create a new column CumReturn
in a Dataframe df
with the cumulative product over the month. I try to reset the cum_prod() to 1 at the end of each month (if EndMonth == 1) and start new with the cumulative product.
df:
Date EndMonth ID1 Return
2023-01-30 0 A 0.95
2023-01-30 0 B 0.98
2023-01-31 1 A 1.01
2023-01-31 1 B 1.02
2023-02-01 0 A 1.05
2023-02-01 0 B 0.92
2023-02-02 0 A 0.97
2023-02-02 0 B 0.99
I tried it to do with: df['CumReturn'] = np.where(df['EndMonth'] == 1, 1, df['Return'].groupby('ID1').cumprod())
When I do that, I get for 2023-02-02
the cumulative product over the whole period and not only since the start of February.
For reproducability:
import pandas as pd
df1 = pd.DataFrame({
'Date':['2023-01-30', '2023-01-30', '2023-01-31', '2023-01-31', '2023-02-01', '2023-02-01', '2023-02-02', '2023-02-02'],
'EndMonth':[0, 0, 1, 1, 0, 0, 0, 0],
'ID1':['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
'Return':[0.95, 0.98, 1.01, 1.02, 1.05, 0.92, 0.97, 0.99]})
df1 = df1.set_index('Date')
Many thanks!
CodePudding user response:
Looks like you want:
# set up grouper per ID and month
g = df1.groupby(['ID1', pd.to_datetime(df1.index).month])['Return']
# get cumprod per month with starting value being 1
# (we divide by the first value)
df1['CumProd'] = g.cumprod().div(g.transform('first'))
Output:
EndMonth ID1 Return CumProd
Date
2023-01-30 0 A 0.95 1.00
2023-01-30 0 B 0.98 1.00
2023-01-31 1 A 1.01 1.01
2023-01-31 1 B 1.02 1.02
2023-02-01 0 A 1.05 1.00
2023-02-01 0 B 0.92 1.00
2023-02-02 0 A 0.97 0.97
2023-02-02 0 B 0.99 0.99
CodePudding user response:
IIUC use:
df1.index = pd.to_datetime(df1.index)
df1['CumReturn'] = (df1['Return'].add(1)
.groupby([df1['ID1'], pd.Grouper(freq='m')])
.cumprod()
.sub(1))
print (df1)
EndMonth ID1 Return CumReturn
Date
2023-01-30 0 A 0.95 0.9500
2023-01-30 0 B 0.98 0.9800
2023-01-31 1 A 1.01 2.9195
2023-01-31 1 B 1.02 2.9996
2023-02-01 0 A 1.05 1.0500
2023-02-01 0 B 0.92 0.9200
2023-02-02 0 A 0.97 3.0385
2023-02-02 0 B 0.99 2.8208