How to get within R squared from plm FE regression?-CodePudding

I regress monthly stocks returns on a set of firm characteristics using the plm package.

library(plm)
set.seed(1)
id=rep(1:10,each=10); t=rep(1:10,10); industry=rep(1:2,each=50); return=rnorm(100); x=rnorm(100)
data=data.frame(id,t,industry,return,x)

In a first step, I want to include time fixed effects. The following two formulas give the same coefficients for x but different R-squares. The first model estimates the overall R-squared, while the second model gives the within R-squared.

reg1=plm(return~x factor(t),model="pooling",index=c("id","t"),data=data)
summary(reg1)$r.squared

reg2=plm(return~x,model="within",index=c("id","t"),data=data,effect="time")
summary(reg2)$r.squared

In a second step, I now want to include both time and industry fixed effects. I obtain coefficients by this formula:

reg3=plm(return~x factor(t) factor(industry),model="pooling",index=c("id","t"), data=data)

Unfortunately, I cannot use the "within" model as in reg2 because industry is not one of my index variables. Is there another way to calculate the within R-squared for reg3?

CodePudding user response：

This is not a direct answer to your question, because I am not sure plm can do this. (It might, but I can’t figure it out.)

However, if you are mainly estimating fixed effects models, then I can warmly recommend the fixest package, which is super fast and offers a convenient formula syntax to specify fixed effects and interactions. Here’s a simple example:

library(fixest)
library(modelsummary)

dat = read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/plm/EmplUK.csv")

models = list(
    feols(wage ~ emp | year, data = dat),
    feols(wage ~ emp | firm, data = dat),
    feols(wage ~ emp | firm   year, data = dat)
)

modelsummary(models)

	Model 1	Model 2	Model 3
emp	-0.039	-0.120	-0.047
	(0.003)	(0.064)	(0.042)
Num.Obs.	1031	1031	1031
R2	0.039	0.868	0.896
R2 Adj.	0.030	0.847	0.879
R2 Within	0.012	0.016	0.003
R2 Pseudo
AIC	6474.0	4687.7	4455.6
BIC	6523.4	5384.0	5191.4
Log.Lik.	-3226.988	-2202.833	-2078.818
Std.Errors	by: year	by: firm	by: firm
FE: year	X		X
FE: firm		X	X

CodePudding user response：

To include time and industry effects (next to individual effects), just use a two-ways within model and include any further fixed effects by factor(eff) in the formula.

For your example, this would be:

reg3 <- plm(return ~ x   factor(industry), model="within", effect = "twoways", index=c("id","t"), data = data)
summary(reg3)
# Twoways effects Within Model
# 
# Call:
# plm(formula = return ~ x   factor(industry), data = data, effect = "twoways", 
#     model = "within", index = c("id", "t"))
# 
# Balanced Panel: n = 10, T = 10, N = 100
# 
# Residuals:
#     Min.  1st Qu.   Median  3rd Qu.     Max. 
# -1.84660 -0.61135  0.06318  0.57474  2.06264 
# 
# Coefficients:
#   Estimate Std. Error t-value Pr(>|t|)
# x 0.050906   0.112408  0.4529   0.6519
# 
# Total Sum of Squares:    68.526
# Residual Sum of Squares: 68.35
# R-Squared:      0.0025571
# Adj. R-Squared: -0.23434
# F-statistic: 0.20509 on 1 and 80 DF, p-value: 0.65187

summary(reg3)$r.squared
#         rsq       adjrsq 
# 0.002557064 -0.234335633

However, note that for your toy example data, the variable industry is collinear after the fixed effects transformation and, thus, drops out of the estimation (see ?detect.lindep for an explanation and another example). Check via, e.g.:

detect.lindep(reg3)
# [1] "Suspicious column number(s): 2"
# [1] "Suspicious column name(s):   factor(industry)2"

Or via:

alias(reg3)
# Model :
# [1] "return ~ x   factor(industry)"
# 
# Complete :
#                   [,1]
# factor(industry)2 0