Logit model with fixed effects and robust standard error-CodePudding

I'm hoping to know the R code for a logit model with fixed effects (year and industry) and a code to run robust standard error clustered by firm.

What I've done is ran the code: glm(formula = y ~ x control factor(year) factor(ind), family = binomial, data)

Could someone tell me:

Whether the code is correct?
How I should run robust standard error clustered by firm?

The regression model that I am using doesn’t have the variable "id" (id for the firm), so when I run summary(mod1, type="clustered", cluster=~i), it shows the error that object "id" doesn’t exist.

My dataset consists of variables: a dependent variable (a binary value that indicates whether a firm revised an earning forecast), an independent variable(a binary value that indicates whether the leader firm beat the earning forecast), and a set of controls. Should I also include the id in the regression model? If so, should it also be fixed effects?

CodePudding user response：

We cannot do LSDV regression for non-linear models (e.g. logit), since fixed effects assumptions break down¹, i.e. don't use glm.

We can use aplaca::feglm. The formula specification is similar to lfe::felm: y ~ x1 x2 | f1 f2, where f refers to the fixed effects factors.

Accordingly you may specify your model like so:

mod1 <- feglm(y ~ x1   x2   x3 | i   t, data)

Clustered standard errors can be specified in the summary with a formula ~i t or ~i.

summary(mod1, type="clustered", cluster=~i)
# binomial - logit link
# 
# y ~ x1   x2   x3 | i   t
# 
# Estimates:
#    Estimate Std. error z value Pr(> |z|)    
# x1  1.09088    0.02482   43.95    <2e-16 ***
# x2 -1.10648    0.02444  -45.27    <2e-16 ***
# x3  1.12316    0.02620   42.87    <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# residual deviance= 15970.75,
# null deviance= 27217.10,
# n= 19940, l= [997, 20]
# 
# ( 60 observation(s) deleted due to perfect classification )
# 
# Number of Fisher Scoring Iterations: 6

Where i refers to your inds and t to your years.

There's also bife::bife, but we currently can't get sandwich standard errors from it.

Data:

set.seed(42)
data <- alpaca::simGLM(1000L, 20L, 1805L, model = "logit")