so the dataframe called idio_vol looks like:
date PERMNO Mkt-RF SMB HML RMW CMA log_excess_ret
2000-1-6 10026 -0.0073 -0.0005 0.0126 0.0065 0.0122 -0.015918593
2000-1-7 10026 0.0321 -0.0094 -0.0143 -0.0088 -0.0095 0.0060892
2000-1-10 10026 0.0176 0.005 -0.0152 -0.0197 -0.0024 -0.025341776
2000-1-11 10026 -0.0171 0.0035 0.0081 0.0081 0.0108 0.002782364
2000-1-12 10026 -0.0069 -0.0023 0.0078 0.0042 0.0096 -0.022827866
2000-1-13 10026 0.0159 0.0049 -0.0091 -0.0177 -0.0105 0.002989781
2000-1-14 10026 0.0115 0.0027 -0.0065 -0.0031 -0.0064 -0.00021
2000-1-18 10026 -0.0026 0.023 -0.0049 -0.0148 -0.0012 -0.00021
2000-1-19 10026 0.0044 0.0064 -0.0036 -0.011 0.0047 0.015997939
2000-1-20 10026 -0.0037 0.0174 -0.0125 -0.0107 -0.0059 -0.0002209
2000-1-21 10026 0.0023 0.0125 -0.0052 -0.0131 0.0038 -0.020409169
2000-1-24 10026 -0.0259 0.014 0.0076 0.0049 0.009 0.021974042
2000-1-6 10027 -0.0073 -0.0005 0.0126 0.0065 0.0122 -0.015918593
2000-1-7 10027 0.0321 -0.0094 -0.0143 -0.0088 -0.0095 0.0060892
2000-1-10 10027 0.0176 0.005 -0.0152 -0.0197 -0.0024 -0.025341776
2000-1-11 10027 -0.0171 0.0035 0.0081 0.0081 0.0108 0.002782364
2000-1-12 10027 -0.0069 -0.0023 0.0078 0.0042 0.0096 -0.022827866
2000-1-13 10027 0.0159 0.0049 -0.0091 -0.0177 -0.0105 0.002989781
2000-1-14 10027 0.0115 0.0027 -0.0065 -0.0031 -0.0064 -0.00021
2000-1-18 10027 -0.0026 0.023 -0.0049 -0.0148 -0.0012 -0.00021
2000-1-19 10027 0.0044 0.0064 -0.0036 -0.011 0.0047 0.015997939
2000-1-20 10027 -0.0037 0.0174 -0.0125 -0.0107 -0.0059 -0.0002209
2000-1-21 10027 0.0023 0.0125 -0.0052 -0.0131 0.0038 -0.020409169
2000-1-24 10027 -0.0259 0.014 0.0076 0.0049 0.009 0.021974042
with daily data. i wish to calculate monthly idiosyncratic volatility by
def ols_residual_std(df):
x = np.matrix(df.iloc[:,2:7])
y=np.matrix(df.iloc[:,-1])
coef = np.linalg.pinv(x.T@x) @ [email protected]
return np.std(y-(coef.T @ x.T))
then
idio_vol.groupby('PERMNO').resample('M',on='date').transform(ols_residual_std)
and the error is IndexingError: Too many indexers apparently I cannot do iloc[:,:] then how do I get the data I need?
to make my question clearer, I want to do something like
for _, df in idio_vol.groupby('PERMNO'):
for month in df.date:
print(ols_residual_std(df))
but I cannot loop over groups nor dates.
CodePudding user response:
I could make this work, although I'm not sure if the result is correct.
I replace .transform
with .apply
, but I also had to put in a rather arbitrary call to df.columns
there. I can't tell yet whether this is a bug in Pandas (1.4.1) or there is something else I'm missing.
Here's my attempt:
def ols_residual_std(df):
df.columns # ?? IndexingError without this line
x = np.matrix(df.iloc[:,2:7])
y=np.matrix(df.iloc[:,-1])
coef = np.linalg.pinv(x.T@x) @ [email protected]
return np.std(y-(coef.T @ x.T))
df.groupby('PERMNO').resample('M',on='date').apply(ols_residual_std)
which results in
date 2000-01-31
PERMNO
10026 0.012563
10027 0.012563