Home > Software design >  An R function cannot work in local environment of other functions
An R function cannot work in local environment of other functions

Time:10-30

I use Matchit package for propensity score matching. It can generate a matched data after matching using get_matches() function. However, if I do not run the get_matches() function in the global environment but include it in any other function, the matched data cannot be found in the local environment. (These prove to be misleading information. There is nothing wrong with MatchIt's output. Answer by Noah explains my question better.)

For producing my data

dataGen <- function(b0,b1,n = 2000,cor = 0){
  # covariate
  sigma <- matrix(rep(cor,9),3,3)
  diag(sigma) <- rep(1,3)
  cov <- MASS::mvrnorm(n, rep(0,3), sigma)
  # error
  error <- rnorm(n,0,sqrt(18))
  # treatment variable
  logit <- b0 b1*cov[,1] 0.3*cov[,2] cov[,3]
  p <- 1/(1 exp(-logit))
  treat <- rbinom(n,1,p)
  # outcome variable
  y <- error treat cov[,1] cov[,2]
  data <- as.data.frame(cbind(cov,treat,y))
  return(data)
}
set.seed(1)
data <- dataGen(b0=-0.92, b1=0.8, 900) 

It is like the following works. The est.m.WLS() can use the m.data.

fm1 <- treat ~ V1 V2 V3

m.out <- MatchIt::matchit(data = data, formula = fm1, link = "logit", m.order = "random", caliper = 0.2)

m.data  <-  MatchIt::get_matches(m.out,data=data)

est.m.WLS <- function(m.data, fm2){
  model.1 <- lm(fm2, data = m.data, weights=(weights))
  est <- model.1$coefficients["treat"]
  ## regular robust standard error ignoring pair membership
  model.1.2 <- lmtest::coeftest(model.1,vcov. = sandwich::vcovHC)
  CI.r <- confint(model.1.2,"treat",level=0.95)
  ## cluster robust standard error accounting for pair membership
  model.2.2 <- lmtest::coeftest(model.1, vcov. = sandwich::vcovCL, cluster = ~subclass)
  CI.cr <- confint(model.2.2,"treat",level=0.95)
  return(c(est=est,CI.r,CI.cr))
}
fm2 <- y ~ treat V1 V2 V3
est.m.WLS(m.data,fm2)

But the next syntax does not work. It will report

"object 'm.data' not found"
rm(m.data)

m.out <- MatchIt::matchit(data = data, formula = fm1, link = "logit", m.order = "random", caliper = 0.2)

est.m.WLS <- function(m.out, fm2){
  m.data  <-  MatchIt::get_matches(m.out,data=data)
  model.1 <- lm(fm2, data = m.data, weights=(weights))
  est <- model.1$coefficients["treat"]
  ## regular robust standard error ignoring pair membership
  model.1.2 <- lmtest::coeftest(model.1,vcov. = sandwich::vcovHC)
  CI.r <- confint(model.1.2,"treat",level=0.95)
  ## cluster robust standard error accounting for pair membership
  model.2.2 <- lmtest::coeftest(model.1, vcov. = sandwich::vcovCL, cluster = ~subclass)
  CI.cr <- confint(model.2.2,"treat",level=0.95)
  return(c(est=est,CI.r,CI.cr))
}
est.m.WLS(m.out,fm2)

Since I want to run parallel loops using the groundhog library for simulation purpose, the get_matches function also cannot work in foreach()%dopar%{...} environment.

res=foreach(s = 1:7,.combine="rbind")%dopar%{

m.out <- MatchIt::matchit(data = data, formula = fm.p, distance = data$logit, m.order = "random", caliper = 0.2)

m.data  <-  MatchIt::get_matches(m.out,data=data)

...

}

How should I fix the problem?

Any help would be appreciated. Thank you!

Using for() loop directly will not run into any problem since it just works in the global environment, but it is too slow... I really hope to do the thousand time simulations at once. Help!

CodePudding user response:

This has nothing to do with MatchIt or get_matches(). Run debugonce(est.m.WLS) with your second implementation of est.m.WLS(). You will see that get_matches() works perfectly fine and returns m.data. The problem is when lmtest() runs with a formula argument for cluster.

This is due to a bug in R, outside any package, that I have already requested to be fixed. The problem is that expand.model.matrix(), a function that searches for the dataset that the variables supplied to cluster could be in, only searches the global environment for data, but m.data does not exist in the global environment. To get around this issue, don't supply a formula to cluster; use cluster = m.data["subclass"]. This should hopefully be resolved in an upcoming R release.

  • Related