Home > Back-end >  Store slope, intercept, and R2 for multiple regressions in one batch calculation in R
Store slope, intercept, and R2 for multiple regressions in one batch calculation in R

Time:02-10

I have a data frame that has a long list of data:

Well.ID | Year | Ave.GWE
1       | 2005 | 525
1       | 2006 | 524
1       | 2004 | 523
2       | 2005 | 552
2       | 2006 | 551
2       | 2007 | 550
.
.
.
10       | 2005 | 582
10       | 2006 | 581
10       | 2007 | 580

I've been able to make a batch plot of the linear regression of Years vs GWE for each Well.ID using ggplot, facet_rep_wrap, geom_smooth, and stat_regline_equation. Now i'd like to create a data frame with the following for each regression:


Well.ID | m (slope) | b (intercept) | R2

Does anyone know if i can run the lm function through a forloop and store all this information automatically?

Thanks

CodePudding user response:

a human readable solution;

modelfun <- function(x){
    model <- lm(Ave.GWE ~ Year,x)
    coefs <- coefficients(model)
    
    intercept <- coefs[1]
    slope <- coefs[2]
    rsq <- summary(model)$r.squared
    
    list(intercept = intercept,slope = slope,rsq = rsq)
}

newdf <- data.frame()

for(i in unique(df[['Well.ID']])){
    subset_df <- subset(df,Well.ID == i)
    
    modelstored <- modelfun(subset_df)
    
    newrow <- data.frame(Well.ID = i,
                         m = modelstored$slope,
                         b = modelstored$intercept,
                         R2 = modelstored$rsq)
    rownames(newrow) <- NULL
    newdf <- rbind(newdf,newrow)
}

newdf

output ;

  Well.ID      m     b    R2
    <dbl>  <dbl> <dbl> <dbl>
1       1  0.500 -479. 0.250
2       2 -1.00  2557. 1    
3      10 -1.00  2587. 1    

CodePudding user response:

We need a bit more data to demonstrate. The following should match approximately your structure:

set.seed(1)

df <- data.frame(Well.ID = rep(1:5, each = 5),
                 Year = rep(2005:2009, 5),
                 Ave.GWE = round(runif(25, 400, 500)))
head(df)
#>   Well.ID Year Ave.GWE
#> 1       1 2005     427
#> 2       1 2006     437
#> 3       1 2007     457
#> 4       1 2008     491
#> 5       1 2009     420
#> 6       2 2005     490

We can get your result by doing

do.call(rbind, lapply(unique(df$Well.ID), function(d) {
  model <- lm(Ave.GWE ~ Year, data = df[df$Well.ID == d,])
  data.frame(Well.ID = d, Intercept = coef(model)[1],
             Slope = coef(model)[2], r_squared = summary(model)$r.squared,
             row.names = NULL)
}))
#>   Well.ID Intercept Slope  r_squared
#> 1       1   -7581.6   4.0 0.04903163
#> 2       2   40403.1 -19.9 0.80086151
#> 3       3  -26047.8  13.2 0.59000406
#> 4       4   -3948.0   2.2 0.02105080
#> 5       5   28541.8 -14.0 0.42416898

Created on 2022-02-09 by the reprex package (v2.0.1)

  • Related