I want to do a equation-by-equation instrumental variable (IV) regression with a control function in R (using tidyverse
and broom
). I want to implement this based on a grouped data frame with a dependent variable, y
, an endogenous variable, x
, an instrument for this endogenous variable, z1
, and an exogeneous variable, z2
. Following a Two Stage Least Squares (2SLS) approach, I would run: (1) Regress x
on z1
and z2
and (2) Regress y
on x
, z2
and v
(the residuals from (1)). For more details for this approach see: https://www.irp.wisc.edu/newsevents/workshops/appliedmicroeconometrics/participants/slides/Slides_14.pdf. Unfortunately, I am not able to run the second regression without an error (see below).
My data looks like this:
df <- data.frame(
id = sort(rep(seq(1, 20, 1), 5)),
group = rep(seq(1, 4, 1), 25),
y = runif(100),
x = runif(100),
z1 = runif(100),
z2 = runif(100)
)
where id
is an identifier for the observations, group
is an identifier for the groups and the rest is defined above.
library(tidyverse)
library(broom)
# Nest the data frame
df_nested <- df %>%
group_by(group) %>%
nest()
# Run first stage regression and retrieve residuals
df_fit <- df_nested %>%
mutate(
fit1 = map(data, ~ lm(x ~ z1 z2, data = .x)),
resids = map(fit1, residuals)
)
Now, I want to run the second stage regression. I've tried two things.
First:
df_fit %>%
group_by(group) %>%
unnest(c(data, resids)) %>%
do(lm(y ~ x z2, data = .x))
This produces Error in is.data.frame(data) : object '.x' not found
.
Second:
df_fit %>%
mutate(
fit2 = map2(data, resids, ~ lm(y ~ x z2, data = .x))
)
df_fit %>% unnest(fit2)
This produces: Error: Must subset columns with a valid subscript vector. x Subscript has the wrong type `grouped_df<
. If you would work with a larger data set, the second approach would even run into storage problems.
How is this done correctly?
CodePudding user response:
The broom
package is loaded but there was no tidy
applied to the lm
output. In addition, the OP's code had some typos i.e. after mutate
ing to create the fit2
, the object 'df_fit' was not updated (<-
), thus df_fit %>% unnest(fit2)
wouldn't work as the column is not found
library(dplyr)
library(purrr)
library(broom)
library(tidyr)
df_fit %>%
ungroup %>%
mutate(
fit2 = map2(data, resids, ~ tidy(lm(y ~ x z2, data = .x))
)) %>%
unnest(fit2)
-output
# A tibble: 12 × 9
group data fit1 resids term estimate std.error statistic p.value
<dbl> <list> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 <tibble [25 × 5]> <lm> <dbl [25]> (Intercept) 0.357 0.126 2.82 0.00987
2 1 <tibble [25 × 5]> <lm> <dbl [25]> x -0.0290 0.173 -0.168 0.868
3 1 <tibble [25 × 5]> <lm> <dbl [25]> z2 0.204 0.183 1.11 0.278
4 2 <tibble [25 × 5]> <lm> <dbl [25]> (Intercept) 0.470 0.139 3.38 0.00272
5 2 <tibble [25 × 5]> <lm> <dbl [25]> x 0.168 0.206 0.816 0.423
6 2 <tibble [25 × 5]> <lm> <dbl [25]> z2 0.00615 0.176 0.0350 0.972
7 3 <tibble [25 × 5]> <lm> <dbl [25]> (Intercept) 0.625 0.147 4.25 0.000325
8 3 <tibble [25 × 5]> <lm> <dbl [25]> x 0.209 0.255 0.818 0.422
9 3 <tibble [25 × 5]> <lm> <dbl [25]> z2 -0.398 0.183 -2.18 0.0406
10 4 <tibble [25 × 5]> <lm> <dbl [25]> (Intercept) 0.511 0.235 2.17 0.0407
11 4 <tibble [25 × 5]> <lm> <dbl [25]> x 0.0468 0.247 0.189 0.851
12 4 <tibble [25 × 5]> <lm> <dbl [25]> z2 -0.0246 0.271 -0.0908 0.929