I'm hoping to know the R code for a logit model with fixed effects (year and industry) and a code to run robust standard error clustered by firm.
What I've done is ran the code: glm(formula = y ~ x control factor(year) factor(ind), family = binomial, data)
Could someone tell me:
- Whether the code is correct?
- How I should run robust standard error clustered by firm?
The regression model that I am using doesn’t have the variable "id" (id for the firm), so when I run summary(mod1, type="clustered", cluster=~i)
, it shows the error that object "id" doesn’t exist.
My dataset consists of variables: a dependent variable (a binary value that indicates whether a firm revised an earning forecast), an independent variable(a binary value that indicates whether the leader firm beat the earning forecast), and a set of controls. Should I also include the id in the regression model? If so, should it also be fixed effects?
CodePudding user response:
We cannot do LSDV regression for non-linear models (e.g. logit), since fixed effects assumptions break down1, i.e. don't use glm
.
We can use aplaca::feglm
. The formula specification is similar to lfe::felm
: y ~ x1 x2 | f1 f2
, where f
refers to the fixed effects factors.
Accordingly you may specify your model like so:
mod1 <- feglm(y ~ x1 x2 x3 | i t, data)
Clustered standard errors can be specified in the summary
with a formula ~i t
or ~i
.
summary(mod1, type="clustered", cluster=~i)
# binomial - logit link
#
# y ~ x1 x2 x3 | i t
#
# Estimates:
# Estimate Std. error z value Pr(> |z|)
# x1 1.09088 0.02482 43.95 <2e-16 ***
# x2 -1.10648 0.02444 -45.27 <2e-16 ***
# x3 1.12316 0.02620 42.87 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# residual deviance= 15970.75,
# null deviance= 27217.10,
# n= 19940, l= [997, 20]
#
# ( 60 observation(s) deleted due to perfect classification )
#
# Number of Fisher Scoring Iterations: 6
Where i
refers to your ind
s and t
to your year
s.
There's also bife::bife
, but we currently can't get sandwich standard errors from it.
Data:
set.seed(42)
data <- alpaca::simGLM(1000L, 20L, 1805L, model = "logit")