Original data
import pandas as pd
import datetime
import random
df = pd.DataFrame(columns=["date", "7 days return"])
date = datetime.datetime.strptime("2019-01-04", '%Y-%m-%d')
for i in range(365*3):
df = df.append({"date": date datetime.timedelta(i), "7 days return":random.uniform(0,1)}, ignore_index=True)
df
Out:
date | 7 days return |
---|---|
2019-01-04 | 0.780342 |
2019-01-05 | 0.7238 |
2019-01-06 | 0.02408 |
2019-01-07 | 0.188626 |
2019-01-08 | 0.255635 |
... | ... |
2021-12-26 | 0.833696 |
2021-12-27 | 0.806889 |
2021-12-28 | 0.902588 |
2021-12-29 | 0.833696 |
2021-12-30 | 0.806889 |
2021-12-31 | 0.902588 |
2022-01-01 | 0.182522 |
2022-01-02 | 0.612269 |
I need to calculate the standard deviations in the recent 365 rows but only use the last row of every 7 rows. For example, if I want to calculate the number for the date 2022-01-02, I only use the rows indexed with 2022-01-02, 2021-12-26, 2021-12-19, 2021-12-12, 2021-12-05 and so on in the recent 1 year. If calculate the number for the date 2022-01-01, I only need the rows indexed with 2022-01-01, 2021-12-25, 2021-12-18, 2021-12-11 and so on in the recent 1 year.
To achieve such results, I coded as followed:
a=[]
for i in range(0,len(df),7):
a.append(i)
Volatility1Y=df.iloc[a].transform(lambda x:x.rolling(52).std())
But it outputed the results also every 7 days, which is different from the results I need. I need to calculate the standard deviation everyday.
Could you please help me with this?
CodePudding user response:
Are you looking for:
volatility_1y = (
df.groupby(df["date"].dt.dayofweek)["7 days return"]
.rolling(52).std()
.droplevel(level=0).sort_index()
)
Essentially: Use .rolling(52)
on sub-series of df["7 days return"]
defined by the day of the week. (If so, you might want to look into the min_periods
keyword argument of .rolling
because otherwise you'll get a lot of NaN
s.)
Btw., you can create your dataframe much more efficient with something like (.append
is depreciated since version 1.4.0):
from random import random
df = pd.DataFrame({
"date": pd.date_range(start="2019-01-04", periods=365 * 3),
"7 days return": [random() for _ in range(365 * 3)]
})
CodePudding user response:
If I understand you correct:
a = []
for i in range(7):
# for each day of week
df_temp = df[i::7].copy()
# calculate rolling std
df_temp['var'] = df_temp['7 days return'].rolling(52).std()
a.append(
df_temp
)
result = pd.concat(a, ignore_index=True)
# sort in the end
result.sort_values('date')