I am attempting to generate prediction intervals using the function predict() for a new set of data, but across more than one model that I have generated for a dataset. I am relatively inexperienced at using lapply, but figure it should be helpful in this process:
#Calling in my libraries:
library(dplyr)
#Creating dataset:
DNase <- DNase
#Generating models, one for each "Run" in DNAse:
model_dna <- DNase %>%
group_by(Run) %>%
do(model_dna_group = lm(log(density) ~ log(conc), data = .)) %>% ungroup()
#Creating a new data set to be used to generate predictions:
new_dna <- as.data.frame(DNase$conc) %>%
mutate(conc = DNase$conc * 2) %>% select(conc)
#Attempting to apply predict to these models for a new data frame:
new_dna_w_predictions <- lapply(
X = model_dna,
FUN = predict,
newdata = new_dna,
interval = "prediction",
level = 0.9
)
However, this draws the following error:
Error in get(as.character(FUN), mode = "function", envir = envir) : object 'model_dna' of mode 'function' was not found
I am not sure how best to structure this lapply function, especially when being used across more than one model. Is there a generally cleaner way to approach this?
CodePudding user response:
Your object model_dna
is a data.frame (more precisely: a tibble) which contains the lm
-objects in its second column "model_dna_group".
In the lapply
-call you should refer to that column, not the whole data frame. Lapply tries to use predict
on the columns of the data frame instead of the lm
-objects within the second column.
So edit your call as follows and it works:
new_dna_w_predictions <- lapply(
X = model_dna$model_dna_group,
FUN = predict,
newdata = new_dna,
interval = "prediction",
level = 0.9
)
CodePudding user response:
Here you have a full tidyverse
solution:
# Calling in my libraries:
library(dplyr)
library(purrr)
# Creating dataset:
DNase <- DNase
# Creating a new data set to be used to generate predictions:
new_dna <- DNase %>% transmute(conc = conc * 2) # simplified
# Generating models, one for each "Run" in DNAse:
model_dna <- DNase %>%
group_by(Run) %>%
summarise(model_dna_group = list(lm(log(density) ~ log(conc))))
model_dna
#> # A tibble: 11 x 2
#> Run model_dna_group
#> <ord> <list>
#> 1 10 <lm>
#> 2 11 <lm>
#> 3 9 <lm>
#> 4 1 <lm>
#> 5 4 <lm>
#> 6 8 <lm>
#> 7 5 <lm>
#> 8 7 <lm>
#> 9 6 <lm>
#> 10 2 <lm>
#> 11 3 <lm>
# Run predictions
model_dna %>%
group_by(Run) %>%
summarise(map(model_dna_group, predict, newdata = new_dna, interval = "prediction", level = 0.9) %>% map_dfr(as_tibble),
.groups = "drop")
#> # A tibble: 1,936 x 4
#> Run fit lwr upr
#> <ord> <dbl> <dbl> <dbl>
#> 1 10 -2.16 -2.48 -1.85
#> 2 10 -2.16 -2.48 -1.85
#> 3 10 -1.33 -1.64 -1.03
#> 4 10 -1.33 -1.64 -1.03
#> 5 10 -0.918 -1.22 -0.617
#> 6 10 -0.918 -1.22 -0.617
#> 7 10 -0.503 -0.804 -0.201
#> 8 10 -0.503 -0.804 -0.201
#> 9 10 -0.0873 -0.392 0.217
#> 10 10 -0.0873 -0.392 0.217
#> # ... with 1,926 more rows
Created on 2021-11-19 by the reprex package (v2.0.0)
Notice:
- after
dplyr
1.0 you don't need to usedo
anymore for this kind of cases - with
map
andmap_dfr
you can calculate your predictions and fit them nicely in yourtibble