DATA BELOW
analysis<-tibble(off_race = c("hispanic", "hispanic", "white","white", "hispanic", "white", "hispanic", "white", "white", "white","hispanic"), any_black_uof = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), any_black_arrest = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), prop_white_scale = c(0.866619646027524, -1.14647499712298, 1.33793539994219, 0.593565300512359, -0.712819809606193, 0.3473585867755, -1.37025501425243, 1.16596624239715, 0.104521426674564, 0.104521426674564, -1.53728347122581), prop_hisp_scale=c(-0.347382203637802, 1.54966785579018,
-0.833021026477168, -0.211470492567308, 1.48353691981021,
0.421968013870802, 2.63739845069911, -0.61002505397242, 0.66674880256898,0.66674880256898, 2.93190487813111))
I would like to run a series of regressions that iterate over these vectors
officer_race = c("black", "white", "hispanic")
primary_ind<-c("prop_white_scale","prop_hisp_scale","prop_black_scale")
outcome<-c("any_black_uof","any_white_uof","any_hisp_uof","any_black_arrest","any_white_arrest","any_hisp_arrest","any_black_stop","any_white_stop","any_hisp_stop")
Also of note, I would like to use the fixest package where the regressions would look like this
feols(any_black_uof~ prop_white_scale,data=analysis[analysis$off_race =="black"])
feols(any_black_uof~ prop_white_scale,data=analysis[analysis$off_race =="white"])
feols(any_black_uof~ prop_white_scale,data=analysis[analysis$off_race=="hispanic"])
feols(any_black_uof~ prop_hisp_scale,data=analysis[analysis$off_race =="black"])
feols(any_black_uof~ prop_hisp_scale,data=analysis[analysis$off_race =="white"])
etc. iterating through all possible combinations and creating a list of lm objects.
Is this possible?
CodePudding user response:
Since you did not provide sample data I am using mtcars
dataset as an example dataset.
I am using cyl
variable as equivalent to race
in your example.
primary_ind <- c("mpg", "gear", "disp")
outcome <- c("hp", "wt")
result <- lapply(split(mtcars, mtcars$cyl), function(x) {
sapply(primary_ind, function(y) {
sapply(outcome, function(z) {
lm(paste(y, z, sep = "~"), x)
}, simplify = FALSE)
}, simplify = FALSE)
})
result
First we split
the data by cyl
values so that we have (3) list for each unique value (4, 6 and 8). Then for each individual dataset loop over primary_ind
and outcome
values and apply lm
for each combination.
sapply
with simplify = FALSE
helps to identify primary_ind
and outcome
value for each model as it is saved in the name of the list.
CodePudding user response:
You can use the built-in multiple estimation tools: see the dedicated vignette. You also need to understand the formula expansion tools, presented here.
It seems you want to iterate over subsets of the data for different explanatory and dependent variables.
Use split
for the subsets and sw
for the explanatory variables and c()
for the dependent variables.
Here is a reproducible example:
library(fixest)
base = setNames(iris, c("y1", "y2", "x1", "x2", "species"))
lhs = c("y1", "y2")
rhs = c("x1", "x2")
mult_est = feols(.[lhs] ~ sw(.[, rhs]), base, split = ~ species)
etable(mult_est)
#> mult_est.1 mult_est.2 mult_est.3 mult_est.4
#> Sample (species) setosa setosa setosa setosa
#> Dependent Var.: y1 y1 y2 y2
#>
#> Constant 4.213*** (0.4156) 4.777*** (0.1239) 2.861*** (0.4564) 3.222*** (0.1349)
#> x1 0.5423. (0.2823) 0.3879 (0.3100)
#> x2 0.9302. (0.4637) 0.8372 (0.5049)
#> ________________ _________________ _________________ _________________ _________________
#> S.E. type IID IID IID IID
#> Observations 50 50 50 50
#> R2 0.07138 0.07734 0.03158 0.05417
#> Adj. R2 0.05204 0.05812 0.01140 0.03447
#>
#> mult_est.5 mult_est.6 mult_est.7 mult_est.8
#> Sample (species) versicolor versicolor versicolor versicolor
#> Dependent Var.: y1 y1 y2 y2
#>
#> Constant 2.408*** (0.4463) 4.045*** (0.4229) 1.175** (0.3421) 1.373*** (0.2296)
#> x1 0.8283*** (0.1041) 0.3743*** (0.0798)
#> x2 1.426*** (0.3155) 1.054*** (0.1713)
#> ________________ __________________ _________________ __________________ _________________
#> S.E. type IID IID IID IID
#> Observations 50 50 50 50
#> R2 0.56859 0.29862 0.31419 0.44089
#> Adj. R2 0.55960 0.28401 0.29990 0.42925
#>
#> mult_est.9 mult_est.10 mult_est.11 mult_est.12
#> Sample (species) virginica virginica virginica virginica
#> Dependent Var.: y1 y1 y2 y2
#>
#> Constant 1.060* (0.4668) 5.269*** (0.6556) 1.673*** (0.4310) 1.695*** (0.2921)
#> x1 0.9957*** (0.0837) 0.2343** (0.0773)
#> x2 0.6508* (0.3207) 0.6314*** (0.1429)
#> ________________ __________________ _________________ _________________ __________________
#> S.E. type IID IID IID IID
#> Observations 50 50 50 50
#> R2 0.74688 0.07902 0.16084 0.28915
#> Adj. R2 0.74161 0.05983 0.14335 0.27434
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1