I regress monthly stocks returns on a set of firm characteristics using the plm
package.
library(plm)
set.seed(1)
id=rep(1:10,each=10); t=rep(1:10,10); industry=rep(1:2,each=50); return=rnorm(100); x=rnorm(100)
data=data.frame(id,t,industry,return,x)
In a first step, I want to include time fixed effects. The following two formulas give the same coefficients for x
but different R-squares. The first model estimates the overall R-squared, while the second model gives the within R-squared.
reg1=plm(return~x factor(t),model="pooling",index=c("id","t"),data=data)
summary(reg1)$r.squared
reg2=plm(return~x,model="within",index=c("id","t"),data=data,effect="time")
summary(reg2)$r.squared
In a second step, I now want to include both time and industry fixed effects. I obtain coefficients by this formula:
reg3=plm(return~x factor(t) factor(industry),model="pooling",index=c("id","t"), data=data)
Unfortunately, I cannot use the "within"
model as in reg2
because industry
is not one of my index variables. Is there another way to calculate the within R-squared for reg3
?
CodePudding user response:
This is not a direct answer to your question, because I am not sure
plm
can do this. (It might, but I can’t figure it out.)
However, if you are mainly estimating fixed effects models, then I can
warmly recommend the fixest
package, which is super fast and
offers a convenient formula syntax to specify fixed effects and
interactions. Here’s a simple example:
library(fixest)
library(modelsummary)
dat = read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/plm/EmplUK.csv")
models = list(
feols(wage ~ emp | year, data = dat),
feols(wage ~ emp | firm, data = dat),
feols(wage ~ emp | firm year, data = dat)
)
modelsummary(models)
Model 1 | Model 2 | Model 3 | |
---|---|---|---|
emp | -0.039 | -0.120 | -0.047 |
(0.003) | (0.064) | (0.042) | |
Num.Obs. | 1031 | 1031 | 1031 |
R2 | 0.039 | 0.868 | 0.896 |
R2 Adj. | 0.030 | 0.847 | 0.879 |
R2 Within | 0.012 | 0.016 | 0.003 |
R2 Pseudo | |||
AIC | 6474.0 | 4687.7 | 4455.6 |
BIC | 6523.4 | 5384.0 | 5191.4 |
Log.Lik. | -3226.988 | -2202.833 | -2078.818 |
Std.Errors | by: year | by: firm | by: firm |
FE: year | X | X | |
FE: firm | X | X |
CodePudding user response:
To include time and industry effects (next to individual effects), just use a two-ways within model and include any further fixed effects by factor(eff)
in the formula.
For your example, this would be:
reg3 <- plm(return ~ x factor(industry), model="within", effect = "twoways", index=c("id","t"), data = data)
summary(reg3)
# Twoways effects Within Model
#
# Call:
# plm(formula = return ~ x factor(industry), data = data, effect = "twoways",
# model = "within", index = c("id", "t"))
#
# Balanced Panel: n = 10, T = 10, N = 100
#
# Residuals:
# Min. 1st Qu. Median 3rd Qu. Max.
# -1.84660 -0.61135 0.06318 0.57474 2.06264
#
# Coefficients:
# Estimate Std. Error t-value Pr(>|t|)
# x 0.050906 0.112408 0.4529 0.6519
#
# Total Sum of Squares: 68.526
# Residual Sum of Squares: 68.35
# R-Squared: 0.0025571
# Adj. R-Squared: -0.23434
# F-statistic: 0.20509 on 1 and 80 DF, p-value: 0.65187
summary(reg3)$r.squared
# rsq adjrsq
# 0.002557064 -0.234335633
However, note that for your toy example data, the variable industry
is collinear after the fixed effects transformation and, thus, drops out of the estimation (see ?detect.lindep
for an explanation and another example). Check via, e.g.:
detect.lindep(reg3)
# [1] "Suspicious column number(s): 2"
# [1] "Suspicious column name(s): factor(industry)2"
Or via:
alias(reg3)
# Model :
# [1] "return ~ x factor(industry)"
#
# Complete :
# [,1]
# factor(industry)2 0