I have a DataFrame
df1
with daily returns (1 r) and I try to calculate cumulative returns over 4 days for specific dates from a list
called dates
and assign them to a new DataFrame
df2
. Is there an efficient way to do that with pandas?
df1:
Return
Date
2022-01-01 1.03
2022-01-02 0.98
2022-01-03 1.01
2022-01-04 0.99
2022-01-05 1.01
2022-01-06 0.93
2022-01-07 0.96
2022-01-08 0.92
2022-01-09 1.03
2022-01-10 1.05
2022-01-11 1.08
2022-01-12 1.02
dates = ['2022-01-02', '2022-01-05', '2022-01-09', '2022-01-10']
I try to get df2 with the cumulative returns for the specific dates over 4 days. As an example for 2022-01-02: 0.9897 = 0.98 * 1.01 * 0.99 * 1.01
CumReturn
Date
2022-01-02 0.9897
2022-01-05 0.8296
2022-01-09 1.1914
2022-01-10 1.1567
Thanks a lot for your suggestion!
CodePudding user response:
You can use cut
and GroupBy.prod
:
g = pd.cut(pd.to_datetime(df1.index),
bins=pd.to_datetime(dates ['2199-01-01']),
labels=dates,
right=False)
out = df1.groupby(g)['Return'].prod()
output:
2022-01-02 0.979902
2022-01-05 0.829590
2022-01-09 1.030000
2022-01-10 1.156680
Name: Return, dtype: float64
CodePudding user response:
You can compute a bottom-up rolling window product with a window size of 4 (or whatever you need). With min_periods=1
, it computes all values, even if the window is smaller than the target value of 4, which seems to be necessary here (otherwise, 2022-01-10
would be NaN
because there are only 3 values).
df2 = df.Return[::-1].rolling(4, min_periods=1).agg(lambda x:x.prod())[::-1].loc[dates]
This results in this series:
>>> df2
Date
2022-01-02 0.989701
2022-01-05 0.829590
2022-01-09 1.191380
2022-01-10 1.156680
Name: Return, dtype: float64
To turn it into a dataframe, do df2.to_frame('CumReturn')
.