I've noticed something very strange while doing some regression analysis. Essentially, when I estimate a regression independently and that same regression within a purrr::map
function and extract the element, I get that these two objects are not identical. My question is why this is the case or IF this SHOULD be the case.
The main reason I ask this is because some packages are having issues pulling information from estimations that are extracted from purrr::map
, but not when I estimate them individually. Here is a small example with some nonsensical regressions:
library(fixest)
library(tidyverse)
## creating a formula for a regression example
formula <- as.formula(paste0(
"mpg", "~",
paste("cyl", collapse = " "),
paste("|"), paste(c("gear", "carb"), collapse = " ")))
## estimating the regression and saying
mtcars_formula <- feols(formula, cluster = "gear", data = mtcars)
## estimating the same regression twice, but using map
mtcars_list_map <- map(list("gear", "gear"), ~ feols(formula, cluster = ., data = mtcars))
## extracting the first element of the list
is_identical_1 <- mtcars_list_map %>%
pluck(1)
## THESE ARE NOT IDENTIAL
identical(mtcars_formula, is_identical_1)
I am tagging this with fixest
package as well, only because this may be package specific...
CodePudding user response:
The differences largely come down to differences in environment. For example, the third element of these lists (i.e. of mtcars_formula
and is_identical_1
) is the formula mpg~cyl
(and in fact mtcars_formula[[3]] == is_identical_1[[3]]
will return TRUE
. However, you will see that these are associated with differing environments.
> mtcars_formula[[3]] == is_identical_1[[3]]
[1] TRUE
> environment(mtcars_formula[[3]])
<environment: 0x560a2490ef40>
> environment(is_identical_1[[3]])
<environment: 0x560a2554d810>
Whether or not you consider these differences "trivial" or not depends on your use case, but you can check the differences like this:
differences =list()
for(i in 1:length(mtcars_formula)) {
if(!identical(mtcars_formula[[i]], is_identical_1[[i]])) {
differences[[names(mtcars_formula)[i]]] = list(mtcars_formula[[i]], is_identical_1[[i]])
}
}
One element that is indeed different is the reported call
(the 4th element)
> mtcars_formula[[4]] == is_identical_1[[4]]
[1] FALSE
> c(mtcars_formula[[4]], is_identical_1[[4]])
[[1]]
feols(fml = formula, data = mtcars, cluster = "gear")
[[2]]
feols(fml = formula, data = mtcars, cluster = .)
This may have something to do with the error you report in the comments above, associated with fwildclusterboot::boottest()
. Note that the call from the object created using map()
indicates the cluster=.
, instead of `cluster="gear".
One way to get around this would be to do something like this:
mtcars_list_map <- map(list("gear", "gear"), function(x) {
# create the model
model = feols(formula, cluster = x, data = mtcars)
# manipulate the call object
model$call$cluster=x
# return the model
model
})