Home > Net >  Programmatically add residuals by group as new column
Programmatically add residuals by group as new column

Time:08-27

I'm trying to add columns to lists in a for loop and I've tried $ notation and the following reprex with no success. What's the approach for base R here?

# selected
month = c(5, 6)

# list of models
models = lapply(setNames(month, paste0("m", month)), function(x) {
    lm(Temp ~ Ozone, subset(airquality, Month == x))
})

# adding residuals of every model to airquality dataset
airquality$residuals = NA

for (model in names(models)) {
    temp = sub("m", "", model)
    data = within(airquality, {
        residuals = ifelse(
            Month == temp,
            models[model][, "residuals"],
            residuals
        )
    })
}
#> Error in models[model][, "residuals"]: número incorreto de dimensões

Created on 2022-08-26 by the reprex package (v2.0.1)

CodePudding user response:

I would refrain from using within in a programmatic approach and stick to standard evaluation:

month = c(5, 6)

# list of models
models = lapply(setNames(month, paste0("m", month)), function(x) {
    lm(Temp ~ Ozone, subset(airquality, Month == x))
})

airquality$residuals = NA

for (model in names(models)) {

    temp = as.integer(sub("m", "", model))
    
    airquality[airquality$Month == temp & !is.na(airquality$Ozone),]$residuals <- models[[model]]$residuals

}

CodePudding user response:

You could do it less cumbersome using by. First, rather than looping over the months, subset to a smaller data frame,

airquality2 <- subset(airquality, Month %in% c(5, 6))

then, run the regression on the data split by the selected months and extract the residuals. They are named by their row numbers which allows you to add them on correct positions using the bracket function `[<-`. Finally just rbind.

airquality2 <- by(airquality, airquality$Month, \(x) {
  resid <- lm(Temp ~ Ozone, x)$resid
  x[names(resid), 'residuals'] <- resid
  x
}) |> do.call(what=rbind)

head(airquality2)
#     Ozone Solar.R Wind Temp Month Day residuals
# 5.1    41     190  7.4   67     5   1 -2.562432
# 5.2    36     118  8.0   72     5   2  3.251984
# 5.3    12     149 12.6   74     5   3  9.161183
# 5.4    18     313 11.5   62     5   4 -3.816117
# 5.5    NA      NA 14.3   56     5   5        NA
# 5.6    28      NA 14.9   66     5   6 -1.444950

If for some reason you need it, you may merge it back to the rest of the data.

airquality <- merge(airquality, airquality2)
  • Related