In the mixed model (or REWB) framework it is common to model within changes by subtracting the cluster mean (demeaning) from a time varying x-variable, see eg. (Bell, Fairbrother & Jones, 2018). This estimator is basically the same as a fixed effects (FE) estimator (shown below using the sleepstudy data).
The issue arises when trying to model polynomials using the same principle. The equality between the estimators break when we enter our demeaned variable as a polynomial. We can restore this equality by first squaring the variable and then demeaning (see. re_poly_fixed).
dt <- lme4::sleepstudy
dt$days_squared <- dt$Days * dt$Days
dt <- cbind(dt, datawizard::demean(dt, select = c("Days", "days_squared"), group = "Subject"))
re <- lme4::lmer(Reaction ~ Days_within (1 | Subject), data = dt, REML = FALSE)
fe <- fixest::feols(Reaction ~ Days | Subject, data = dt)
re_poly <- lme4::lmer(Reaction ~ poly(Days_within, 2, raw = TRUE) (1 | Subject),
data = dt, REML = FALSE)
fe_poly <- fixest::feols(Reaction ~ poly(Days, 2, raw = TRUE) | Subject, data = dt)
re_poly_fixed <- lme4::lmer(Reaction ~ Days_within days_squared_within (1 | Subject),
data = dt, REML = FALSE)
models <-
list("re" = re, "fe" = fe, "re_poly" = re_poly, "fe_poly" = fe_poly, "re_poly_fixed" = re_poly_fixed)
modelsummary::modelsummary(models)
The main issue with this strategy is that for postestimation, especially packages that calculate marginal effects (e.g. marginaleffects in R or margins in STATA) the variable needs to be entered as a polynomial term for the calculations to consider both x and x^2. That is using poly() or I() in R or factor notation c.x##c.x in STATA). The difference can be seen in the two calls below, where the FE-call returns one effect for "Days" and the manual call returns two separate terms.
(me_fe <- summary(marginaleffects::marginaleffects(fe_poly)))
(me_re <- summary(marginaleffects::marginaleffects(re_poly_fixed)))
I may be missing something obvious here, but is it possible to retain the equality between the estimators in FE and the Mixed model setups with polynomials, while still being able to use common packages for marginal effects?
CodePudding user response:
The problem is that when a transformed variable is hardcoded, the marginaleffects
package does not know that it should manipulate both the transformed and the original at the same time to compute the slope. One solution is to de-mean inside the formula with I()
. You should be aware that this may make the model fitting less efficient.
Here’s an example where I pre-compute the within-group means using data.table
, but you could achieve the same result with dplyr::group_by()
:
library(lme4)
library(data.table)
library(modelsummary)
library(marginaleffects)
dt <- data.table(lme4::sleepstudy)
dt[, `:=`(Days_mean = mean(Days),
Days_within = Days - mean(Days)),
by = "Subject"]
re_poly <- lmer(
Reaction ~ poly(Days_within, 2, raw = TRUE) (1 | Subject),
data = dt, REML = FALSE)
re_poly_2 <- lmer(
Reaction ~ poly(I(Days - Days_mean), 2, raw = TRUE) (1 | Subject),
data = dt, REML = FALSE)
models <- list(re_poly, re_poly_2)
modelsummary(models, output = "markdown")
Model 1 | Model 2 | |
---|---|---|
(Intercept) | 295.727 | 295.727 |
(9.173) | (9.173) | |
poly(Days_within, 2, raw = TRUE)1 | 10.467 | |
(0.799) | ||
poly(Days_within, 2, raw = TRUE)2 | 0.337 | |
(0.316) | ||
poly(I(Days - Days_mean), 2, raw = TRUE)1 | 10.467 | |
(0.799) | ||
poly(I(Days - Days_mean), 2, raw = TRUE)2 | 0.337 | |
(0.316) | ||
SD (Intercept Subject) | 36.021 | 36.021 |
SD (Observations) | 30.787 | 30.787 |
Num.Obs. | 180 | 180 |
R2 Marg. | 0.290 | 0.290 |
R2 Cond. | 0.700 | 0.700 |
AIC | 1795.8 | 1795.8 |
BIC | 1811.8 | 1811.8 |
ICC | 0.6 | 0.6 |
RMSE | 29.32 | 29.32 |
The estimated average marginal effects are – as expected – different:
marginaleffects(re_poly) |> summary()
#> Term Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
#> 1 Days_within 10.47 0.7989 13.1 < 2.22e-16 8.902 12.03
#>
#> Model type: lmerMod
#> Prediction type: response
marginaleffects(re_poly_2) |> summary()
#> Term Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
#> 1 Days 10.47 0.7989 13.1 < 2.22e-16 8.902 12.03
#>
#> Model type: lmerMod
#> Prediction type: response