Home > Net >  how to do groupby resample transform properly?
how to do groupby resample transform properly?

Time:04-03

so the dataframe called idio_vol looks like:

date       PERMNO   Mkt-RF  SMB     HML     RMW    CMA  log_excess_ret
2000-1-6    10026   -0.0073 -0.0005 0.0126  0.0065  0.0122  -0.015918593
2000-1-7    10026   0.0321  -0.0094 -0.0143 -0.0088 -0.0095 0.0060892
2000-1-10   10026   0.0176  0.005   -0.0152 -0.0197 -0.0024 -0.025341776
2000-1-11   10026   -0.0171 0.0035  0.0081  0.0081  0.0108  0.002782364
2000-1-12   10026   -0.0069 -0.0023 0.0078  0.0042  0.0096  -0.022827866
2000-1-13   10026   0.0159  0.0049  -0.0091 -0.0177 -0.0105 0.002989781
2000-1-14   10026   0.0115  0.0027  -0.0065 -0.0031 -0.0064 -0.00021
2000-1-18   10026   -0.0026 0.023   -0.0049 -0.0148 -0.0012 -0.00021
2000-1-19   10026   0.0044  0.0064  -0.0036 -0.011  0.0047  0.015997939
2000-1-20   10026   -0.0037 0.0174  -0.0125 -0.0107 -0.0059 -0.0002209
2000-1-21   10026   0.0023  0.0125  -0.0052 -0.0131 0.0038  -0.020409169
2000-1-24   10026   -0.0259 0.014   0.0076  0.0049  0.009   0.021974042
2000-1-6    10027   -0.0073 -0.0005 0.0126  0.0065  0.0122  -0.015918593
2000-1-7    10027   0.0321  -0.0094 -0.0143 -0.0088 -0.0095 0.0060892
2000-1-10   10027   0.0176  0.005   -0.0152 -0.0197 -0.0024 -0.025341776
2000-1-11   10027   -0.0171 0.0035  0.0081  0.0081  0.0108  0.002782364
2000-1-12   10027   -0.0069 -0.0023 0.0078  0.0042  0.0096  -0.022827866
2000-1-13   10027   0.0159  0.0049  -0.0091 -0.0177 -0.0105 0.002989781
2000-1-14   10027   0.0115  0.0027  -0.0065 -0.0031 -0.0064 -0.00021
2000-1-18   10027   -0.0026 0.023   -0.0049 -0.0148 -0.0012 -0.00021
2000-1-19   10027   0.0044  0.0064  -0.0036 -0.011  0.0047  0.015997939
2000-1-20   10027   -0.0037 0.0174  -0.0125 -0.0107 -0.0059 -0.0002209
2000-1-21   10027   0.0023  0.0125  -0.0052 -0.0131 0.0038  -0.020409169
2000-1-24   10027   -0.0259 0.014   0.0076  0.0049  0.009   0.021974042

with daily data. i wish to calculate monthly idiosyncratic volatility by

def ols_residual_std(df):
  x = np.matrix(df.iloc[:,2:7])
  y=np.matrix(df.iloc[:,-1])
  coef = np.linalg.pinv(x.T@x) @ [email protected]
return np.std(y-(coef.T @ x.T))

then

idio_vol.groupby('PERMNO').resample('M',on='date').transform(ols_residual_std)

and the error is IndexingError: Too many indexers apparently I cannot do iloc[:,:] then how do I get the data I need?

to make my question clearer, I want to do something like

for _, df in idio_vol.groupby('PERMNO'):
  for month in df.date:
    print(ols_residual_std(df))

but I cannot loop over groups nor dates.

CodePudding user response:

I could make this work, although I'm not sure if the result is correct. I replace .transform with .apply, but I also had to put in a rather arbitrary call to df.columns there. I can't tell yet whether this is a bug in Pandas (1.4.1) or there is something else I'm missing.

Here's my attempt:

def ols_residual_std(df):
    df.columns   # ?? IndexingError without this line
    x = np.matrix(df.iloc[:,2:7])
    y=np.matrix(df.iloc[:,-1])
    coef = np.linalg.pinv(x.T@x) @ [email protected]
    return np.std(y-(coef.T @ x.T))

df.groupby('PERMNO').resample('M',on='date').apply(ols_residual_std)

which results in

date    2000-01-31
PERMNO            
10026     0.012563
10027     0.012563
  • Related