I'm trying to add columns to lists in a for loop and I've tried $
notation and the following reprex with no success. What's the approach for base R here?
# selected
month = c(5, 6)
# list of models
models = lapply(setNames(month, paste0("m", month)), function(x) {
lm(Temp ~ Ozone, subset(airquality, Month == x))
})
# adding residuals of every model to airquality dataset
airquality$residuals = NA
for (model in names(models)) {
temp = sub("m", "", model)
data = within(airquality, {
residuals = ifelse(
Month == temp,
models[model][, "residuals"],
residuals
)
})
}
#> Error in models[model][, "residuals"]: número incorreto de dimensões
Created on 2022-08-26 by the reprex package (v2.0.1)
CodePudding user response:
I would refrain from using within
in a programmatic approach and stick to standard evaluation:
month = c(5, 6)
# list of models
models = lapply(setNames(month, paste0("m", month)), function(x) {
lm(Temp ~ Ozone, subset(airquality, Month == x))
})
airquality$residuals = NA
for (model in names(models)) {
temp = as.integer(sub("m", "", model))
airquality[airquality$Month == temp & !is.na(airquality$Ozone),]$residuals <- models[[model]]$residuals
}
CodePudding user response:
You could do it less cumbersome using by
. First, rather than looping over the months, subset
to a smaller data frame,
airquality2 <- subset(airquality, Month %in% c(5, 6))
then, run the regression on the data split by
the selected months and extract the residuals. They are named by their row numbers which allows you to add them on correct positions using the bracket function `[<-`
. Finally just rbind
.
airquality2 <- by(airquality, airquality$Month, \(x) {
resid <- lm(Temp ~ Ozone, x)$resid
x[names(resid), 'residuals'] <- resid
x
}) |> do.call(what=rbind)
head(airquality2)
# Ozone Solar.R Wind Temp Month Day residuals
# 5.1 41 190 7.4 67 5 1 -2.562432
# 5.2 36 118 8.0 72 5 2 3.251984
# 5.3 12 149 12.6 74 5 3 9.161183
# 5.4 18 313 11.5 62 5 4 -3.816117
# 5.5 NA NA 14.3 56 5 5 NA
# 5.6 28 NA 14.9 66 5 6 -1.444950
If for some reason you need it, you may merge it back to the rest of the data.
airquality <- merge(airquality, airquality2)