Home > Software design >  Error in eval(substitute(subset), data, env)
Error in eval(substitute(subset), data, env)

Time:06-13

I am building a linear model that should take the subset of cooks distance. However, it keep producing the following error:

Error in eval(substitute(subset), data, env)

Unfortunately I cannot reproduce the error because whenever I take the chunks that reproduce the error and implement them into a separate function it works properly. I have tried these steps:

  1. Return all arguments to see what is different - they all return the same output
  2. Checked the arrangement of coefficients in formula (thought this somehow caused an issue and it didn't.)

From what I observe, the results stored in the variables for the two function are EXACTLY the same, but I am getting an error from the first.

perModel is a function that I wrote and it's fairly long, it just takes combinations of linear models and returns them as a list, the function is here: [perModel function]

I am having no issues with the output from that function but I cannot get the same result.

i.e. here are the two functions I have tried:

require(faraway)
test_func <- function(data, response){
  models <- perModel(data, response=response, predictors='all') %>% last()
  cook <- cooks.distance(models)
  form <- models %>% formula()
  lmodi <- lm(form, data, subset = (cook < max(cook)))
  return(lmodi)
}
test_func(savings, response='sr')

#form output
sr ~ pop75   dpi   ddpi   pop15
<environment: 0x7fbfbe5457b0>

and

test_res <- function(data, response, predictors){
  model <- reformulate(predictors, response=response)
  lmod <- lm(model, data)
  cook <- cooks.distance(lmod)
  form <- lmod %>% formula()
  lmodi <- lm(form, data, subset = (cook < max(cook)))
  return(lmodi)
}
test_res(savings, response='sr', predictors=c("pop15", "pop75", 'dpi', 'ddpi'))

#form output
sr ~ pop15   pop75   dpi   ddpi
<environment: 0x7fbf8f508680>

First one produces the error and the second works.

The cooks.distance output for both functions are EXACTLY the same:

test_func(savings, response='sr') %>% data.frame(test_func =., test_res = test_res(savings, response='sr', predictors=c("pop75", "dpi", 'ddpi', 'pop15')))
                  test_func     test_res
Australia      8.035888e-04 8.035888e-04
Austria        8.175997e-04 8.175997e-04
Belgium        7.154674e-03 7.154674e-03
Bolivia        7.278744e-04 7.278744e-04
Brazil         1.402735e-02 1.402735e-02
Canada         3.106199e-04 3.106199e-04
Chile          3.781324e-02 3.781324e-02
China          8.156984e-03 8.156984e-03
Colombia       1.879460e-03 1.879460e-03
Costa Rica     3.207537e-02 3.207537e-02
Denmark        2.879580e-02 2.879580e-02
Ecuador        5.818699e-03 5.818699e-03
Finland        4.364051e-03 4.364051e-03
France         1.547176e-02 1.547176e-02
Germany        4.736572e-05 4.736572e-05
Greece         1.590102e-02 1.590102e-02
Guatamala      1.067111e-02 1.067111e-02
Honduras       4.741920e-04 4.741920e-04
Iceland        4.352902e-02 4.352902e-02
India          2.965778e-04 2.965778e-04
Ireland        5.439637e-02 5.439637e-02
Italy          3.919100e-03 3.919100e-03
Japan          1.428162e-01 1.428162e-01
Korea          3.555386e-02 3.555386e-02
Luxembourg     3.993882e-03 3.993882e-03
Malta          1.146827e-02 1.146827e-02
Norway         5.558570e-04 5.558570e-04
Netherlands    2.744377e-04 2.744377e-04
New Zealand    4.379219e-03 4.379219e-03
Nicaragua      3.226479e-04 3.226479e-04
Panama         6.333674e-03 6.333674e-03
Paraguay       4.157229e-02 4.157229e-02
Peru           4.401457e-02 4.401457e-02
Philippines    4.522120e-02 4.522120e-02
Portugal       9.733900e-04 9.733900e-04
South Africa   2.405063e-04 2.405063e-04
South Rhodesia 5.267290e-03 5.267290e-03
Spain          5.659085e-04 5.659085e-04
Sweden         4.055963e-02 4.055963e-02
Switzerland    7.334746e-03 7.334746e-03
Turkey         4.224370e-03 4.224370e-03
Tunisia        9.562447e-03 9.562447e-03
United Kingdom 1.496628e-02 1.496628e-02
United States  1.284481e-02 1.284481e-02
Venezuela      1.886141e-02 1.886141e-02
Zambia         9.663275e-02 9.663275e-02
Jamaica        2.402677e-02 2.402677e-02
Uruguay        8.532329e-03 8.532329e-03
Libya          2.680704e-01 2.680704e-01
Malaysia       9.113404e-03 9.113404e-03

Traceback:

8: eval(substitute(subset), data, env)
7: eval(substitute(subset), data, env)
6: model.frame.default(formula = form, data = data, subset = (cook < 
       max(cook)), drop.unused.levels = TRUE)
5: stats::model.frame(formula = form, data = data, subset = (cook < 
       max(cook)), drop.unused.levels = TRUE)
4: eval(mf, parent.frame())
3: eval(mf, parent.frame())
2: lm(form, data, subset = (cook < max(cook))) at #5
1: test_func(savings, response = "sr")

CodePudding user response:

Something is taking over an alias (variable) you're using as something else. There are multiple strategies to use here:

  • Use trace(), either try to call trace() after the error, and then you'll get the traceback again, but this time you'll be asked to specify which layer you want to pop into, and check what the variables names are. You might have to use options(error = trace).
  • Try to use library(conflicted). Then run the chunks and see if the are conflicts that it asks you to clarify.

CodePudding user response:

Whilst I still could not figure out what is causing this issue I found a temporary solution by imitating func_res because that works at least. The problem is that form and cooks should be the same and they are, these are the only variables put into lmodi besides the dataset.

Anyways, here's my temporary solution:

test_func <- function(data, response){
  models <-
    perModel(data, response = response, predictors = 'all')  %>% last()
  form <- models %>% formula() %>% deparse()
  predictors <-
    form %>% strsplit(., " ") %>% unlist() %>% data.frame(names = .) %>% slice(., n =
                                                                                 seq(1, nrow(.), 2)) %>% .[2:nrow(.), ]
  re_form <- reformulate(predictors, response)
  new_lm <- lm(re_form, data)
  cook <- cooks.distance(new_lm)
  lmodi <- lm(re_form, data, subset = (cook < max(cook)))
  return(lmodi)
}

Call:
lm(formula = re_form, data = data, subset = (cook < max(cook)))

Coefficients:
(Intercept)        pop75          dpi         ddpi        pop15  
 24.5240460   -1.2808669   -0.0003189    0.6102790   -0.3914401  
  •  Tags:  
  • r
  • Related