After simulating 100,000 observations from DGP and splitting them to create a list of 1000 data frames with 100 observations each I would like to fit the the same equation to each data frame separately. I was wondering how to get separate coefficients for each data frame?
α <- 6
ß_1 <- 0.5
ß_2 <- 0.1
X_i <- rnorm(n = 100000, mean = 5, sd = 2)
X_i_squared <- X_i^2
e_i <- rnorm(n = 100000, mean = 0, sd = 1)
Y_i <- α ß_1*X_i ß_2*X_i^2 e_i
df <- data.frame(Y_i, X_i, X_i_squared, e_i)
Splitted_df <- split(df, rep(1:1000, each = 100))
I used function split() to split the original data frame in list of 1000 new data frames and I am not sure how to proceed? Do I need to use some of the functions from apply family or? If anyone could help I would really appreciate it!
CodePudding user response:
Using lapply
you could create a list of models like so:
mods <- lapply(Splitted_df, function(x) lm(Y_i ~ X_i X_i_squared, data = x))
And using purrr::map_df
for convenience and broom::tidy
you could get the coefficients as a dataframe like so:
mods_tidy <- purrr::map_df(mods, broom::tidy, .id = "model")
head(mods_tidy)
#> # A tibble: 6 × 6
#> model term estimate std.error statistic p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 (Intercept) 5.79 0.475 12.2 2.83e-21
#> 2 1 X_i 0.591 0.170 3.48 7.51e- 4
#> 3 1 X_i_squared 0.0942 0.0147 6.39 5.84e- 9
#> 4 2 (Intercept) 6.38 0.521 12.3 2.07e-21
#> 5 2 X_i 0.410 0.220 1.86 6.53e- 2
#> 6 2 X_i_squared 0.107 0.0220 4.86 4.55e- 6