Home > Software design >  Subset with list in linear model
Subset with list in linear model

Time:06-13

I am having troubles with using subset which involves lists in my linear model. One of two things happen, when I use lapply I get object 'x' not found. When I use mapply I get that object 'y' not found. For some reason neither variable is working in subset.

Here are my two approaches:

library(faraway)
data(savings)

form <- "sr ~ pop75   dpi   ddpi   pop15"
#approach one
lapply(label_res, function(x){
     lm(form, savings, subset = (!row.names(savings) %in% x))
})
#output does not change for all linear models

#approach two
form_list<-rep(form, 4) %>% as.list()
mapply(function(x, y)
    {lm(x, savings, subset=(!row.names(savings) %in% y))}, form_list, label_res)
>Error in row.names(savings) %in% y : object 'y' not found

However this does work:

library(faraway)
data(savings)

form <- "sr ~ pop75   dpi   ddpi   pop15"
#approach one
lapply(label_res, function(x){subset(savings, !row.names(savings) %in% x) %>%
    lm(form, .)
})

Here s label_res:

list(pop75 = c("Ireland", "Japan"), dpi = c("Ireland", "Sweden", 
"United States"), ddpi = "Libya", pop15 = c("Japan", "Libya"))

CodePudding user response:

This is due to the way that lm internally calls model.frame. It tries to evaluate the arguments in the parent frame, which means that it is trying to find an object called x in the global environment, when one doesn't exist. There are various ways round this. The easiest is to subset the data frame you pass to lm in the first place, which actually requires less typing than using subset anyway:

lapply(label_res, function(x) {
  lm(form, savings[!row.names(savings) %in% x,])
})

Another, less satisfactory way if you really want to use the subset argument, is to write x into the calling environment with each iteration

lapply(label_res, function(x) {
  x <<- x
  lm(form, savings, subset = !row.names(savings) %in% x)
})

I prefer the first option since it doesn't have any side effects in the calling frame, but they both produce the same output:

#> $pop75
#> 
#> Call:
#> lm(formula = form, data = savings[!row.names(savings) %in% x, 
#>     ])
#> 
#> Coefficients:
#> (Intercept)        pop75          dpi         ddpi        pop15  
#>  26.0381140   -1.4257571   -0.0002792    0.3547081   -0.4078521  
#> 
#> 
#> $dpi
#> 
#> Call:
#> lm(formula = form, data = savings[!row.names(savings) %in% x, 
#>     ])
#> 
#> Coefficients:
#> (Intercept)        pop75          dpi         ddpi        pop15  
#>   29.009000    -2.216356     0.000589     0.443409    -0.469873  
#> 
#> 
#> $ddpi
#> 
#> Call:
#> lm(formula = form, data = savings[!row.names(savings) %in% x, 
#>     ])
#> 
#> Coefficients:
#> (Intercept)        pop75          dpi         ddpi        pop15  
#>  24.5240460   -1.2808669   -0.0003189    0.6102790   -0.3914401  
#> 
#> 
#> $pop15
#> 
#> Call:
#> lm(formula = form, data = savings[!row.names(savings) %in% x, 
#>     ])
#> 
#> Coefficients:
#> (Intercept)        pop75          dpi         ddpi        pop15  
#>   20.580204    -0.644662    -0.000448     0.516628    -0.310588

Created on 2022-06-12 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related