Home > Software engineering >  r - How to fit a lm() function on a list of data sets?
r - How to fit a lm() function on a list of data sets?

Time:11-09

After simulating 100,000 observations from DGP and splitting them to create a list of 1000 data frames with 100 observations each I would like to fit the the same equation to each data frame separately. I was wondering how to get separate coefficients for each data frame?

α <- 6
ß_1 <- 0.5
ß_2 <- 0.1
X_i <- rnorm(n = 100000, mean = 5, sd = 2)
X_i_squared <- X_i^2
e_i <- rnorm(n = 100000, mean = 0, sd = 1)
Y_i <- α   ß_1*X_i   ß_2*X_i^2   e_i

df <- data.frame(Y_i, X_i, X_i_squared, e_i)

Splitted_df <- split(df, rep(1:1000, each = 100))

I used function split() to split the original data frame in list of 1000 new data frames and I am not sure how to proceed? Do I need to use some of the functions from apply family or? If anyone could help I would really appreciate it!

CodePudding user response:

Using lapply you could create a list of models like so:

mods <- lapply(Splitted_df, function(x) lm(Y_i ~ X_i   X_i_squared, data = x))

And using purrr::map_df for convenience and broom::tidy you could get the coefficients as a dataframe like so:

mods_tidy <- purrr::map_df(mods, broom::tidy, .id = "model")

head(mods_tidy)
#> # A tibble: 6 × 6
#>   model term        estimate std.error statistic  p.value
#>   <chr> <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 1     (Intercept)   5.79      0.475      12.2  2.83e-21
#> 2 1     X_i           0.591     0.170       3.48 7.51e- 4
#> 3 1     X_i_squared   0.0942    0.0147      6.39 5.84e- 9
#> 4 2     (Intercept)   6.38      0.521      12.3  2.07e-21
#> 5 2     X_i           0.410     0.220       1.86 6.53e- 2
#> 6 2     X_i_squared   0.107     0.0220      4.86 4.55e- 6
  • Related