Creating a linear regression model for each group in a column-CodePudding

I refer to this answer:


linear_model <- function(TIME) lm(Education ~ poly(TIME,2), data=table2)

m <- lapply(split(table2,table2$LOCATION),linear_model)

new_df <- data.frame(TIME=c(2019))

my_predict <- function(TIME) predict(m,new_df)

sapply(m,my_predict)   #error here

CodePudding user response：

Are you looking for such a solution?

library(tidyverse)
library(broom)
df %>% 
  mutate(LOCATION = as_factor(LOCATION)) %>% 
  group_by(LOCATION) %>% 
  group_split() %>% 
  map_dfr(.f = function(df){
    lm(Education ~ TIME, data = df) %>% 
      glance() %>% 
      add_column(LOCATION = unique(df$LOCATION), .before=1)
  })

  LOCATION r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <fct>        <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1 AUT         0.367         0.261   4.88     3.47    0.112     1  -22.9  51.8  52.0    143.            6     8
2 BEL         0.0225       -0.173   3.90     0.115   0.748     1  -18.3  42.6  42.4     76.0           5     7
3 CZE         0.0843       -0.0683  3.22     0.552   0.485     1  -19.6  45.1  45.3     62.2           6     8

CodePudding user response：

You have some mistakes in the syntax of your functions. Functions are usually written as function(x), and then you substitute the x with the data you want to use it with.

For example, in the linear_model function you defined, if you were to use it alone you would write:

linear_model(data)

However, because you are using it inside the lapply function it is a bit more tricky to see. Lapply is just making a loop and applying the linear_model function to each of the data frames you obtain from split(table2,table2$LOCATION).

The same thing happens with my_predict.

Anyway, this should work for you:

linear_model <- function(x) lm(Education ~ TIME, x)

m <- lapply(split(table2,table2$LOCATION),linear_model)

new_df <- data.frame(TIME=c(2019))

my_predict <- function(x) predict(x,new_df)

sapply(m,my_predict)