I am trying to write a function that will run multiple regressions and then store the outputs in a vector. What I want is for the function to pick the dependent variables from a list that I will provide, and then run the regressions on the same right hand-side variables. Not sure how to go about doing this. Any hints will be appreciated.
my_data <- data.frame(x1=(1:10) rnorm(10, 3, 1.5), x2=25/3 rnorm(10, 0, 1),
dep.var1=seq(5, 28, 2.5), dep.var2=seq(100, -20, -12.5),
dep.var3=seq(1, 25, 2.5))
## The following is a list that tells the function
dep.var <- list(dep.var1=my_data$dep.var1, dep.var2=my_data$dep.var2)
## which dependent variables to use from my_data
all_models <- function(dep.var){lm(dep.var ~ x1 x2, data=my_data)}
model <- sapply(dep.var, all_models) ## The "sapply" here tells the function to
## take the dependent variables from the list dep.var.
I want the "model" list to have two objects: model1 with dep.var1 and model2 with dep.var2. Then as required, I will use summary(model#) to see the regression output.
I know that this in theory works when a vector is used (i.e., p):
p <- seq(0.25, 0.95, 0.05)
s <- function(p) {1 - pnorm(35, p*1*44, sqrt(44)*sqrt(p*(1 - p)))}
f <- sapply(p, s)
But I can't get the whole thing to work as required for my regression models. It works somewhat because you can run and check "model" and it will show you the two regression outputs - but it is horrible. And the "model" does not show the regression specification, i.e., dep.var1 ~ x1 x2.
CodePudding user response:
Consider reformulate
to dynamically change model formulas using character values for lm
calls:
# VECTOR OF COLUMN NAMES (NOT VALUES)
dep.vars <- c("dep.var1", "dep.var2")
# USER-DEFINED METHOD TO PROCESS DIFFERENT DEP VAR
run_model <- function(dep.var) {
fml <- reformulate(c("x1", "x2"), dep.var)
lm(fml, data=data)
}
# NAMED LIST OF MODELS
all_models <- sapply(dep.vars, run_model, simplify = FALSE)
# OUTPUT RESULTS
all_models$dep.var1
all_models$dep.var2
...
From there, you can run further extractions or processes across model objects:
# NAMED LIST OF MODEL SUMMARIES
all_summaries <- lapply(all_models, summary)
all_summaries$dep.var1
all_summaries$dep.var2
...
# NAMED LIST OF MODEL COEFFICIENTS
all_coefficients <- lapply(all_models, `[`, "coefficients")
all_coefficients$dep.var1
all_coefficients$dep.var2
...
CodePudding user response:
You could sapply
over the names of the dependent vars which you could nicely identify with grep
. In lm
use reformulate
to build the formula.
sapply(grep('^dep', names(my_data), value=TRUE), \(x)
lm(reformulate(c('x1', 'x2'), x), my_data))
# dep.var1 dep.var2 dep.var3
# coefficients numeric,3 numeric,3 numeric,3
# residuals numeric,10 numeric,10 numeric,10
# effects numeric,10 numeric,10 numeric,10
# rank 3 3 3
# fitted.values numeric,10 numeric,10 numeric,10
# assign integer,3 integer,3 integer,3
# qr qr,5 qr,5 qr,5
# df.residual 7 7 7
# xlevels list,0 list,0 list,0
# call expression expression expression
# terms dep.var1 ~ x1 x2 dep.var2 ~ x1 x2 dep.var3 ~ x1 x2
# model data.frame,3 data.frame,3 data.frame,3
The dep.var*
appear nicely in the result.
However, you probably want to use lapply
and pipe it into setNames()
to get the list elements named. Instead of grep
you may of course define the dependent variables manually. To get a clean formular call stored, we use a trick once @g-grothendieck taught me using do.call
.
dv <- as.list(grep('^dep', names(my_data), value=TRUE)[1:2])
res <- lapply(dv, \(x) {
f <- reformulate(c('x1', 'x2'), x)
do.call('lm', list(f, quote(my_data)))
}) |>
setNames(dv)
res
# $dep.var1
#
# Call:
# lm(formula = dep.var1 ~ x1 x2, data = my_data)
#
# Coefficients:
# (Intercept) x1 x2
# -4.7450 2.3398 0.2747
#
#
# $dep.var2
#
# Call:
# lm(formula = dep.var2 ~ x1 x2, data = my_data)
#
# Coefficients:
# (Intercept) x1 x2
# 148.725 -11.699 -1.373
This allows you to get the summary()
of the objects, which probably is what you want.
summary(res$dep.var1)
# Call:
# lm(formula = dep.var1 ~ x1 x2, data = my_data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -2.8830 -1.8345 -0.2326 1.4335 4.2452
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -4.7450 7.2884 -0.651 0.536
# x1 2.3398 0.2836 8.251 7.48e-05 ***
# x2 0.2747 0.7526 0.365 0.726
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 2.55 on 7 degrees of freedom
# Multiple R-squared: 0.9117, Adjusted R-squared: 0.8865
# F-statistic: 36.14 on 2 and 7 DF, p-value: 0.0002046
Finally you could wrap it in a function
calc_models <- \(dv) {
lapply(dv, \(x) {
f <- reformulate(c('x1', 'x2'), x)
do.call('lm', list(f, quote(my_data)))
}) |>
setNames(dv)
}
calc_models(list('dep.var1', 'dep.var2'))
CodePudding user response:
Here is a way how you could iterate through your dataframe and apply the function to the group you define (here dep.var
) and save the different models in a dataframe:
library(tidyverse)
library(broom)
my_data %>%
pivot_longer(
starts_with("dep"),
names_to = "group",
values_to = "dep.var"
) %>%
mutate(group = as.factor(group)) %>%
group_by(group) %>%
group_split() %>%
map_dfr(.f = function(df) {
lm(dep.var ~ x1 x2, data = df) %>%
tidy() %>% # first output
#glance() %>% # second output
add_column(group = unique(df$group), .before=1)
})
dataframe output:
# A tibble: 9 x 6
group term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 dep.var1 (Intercept) -5.29 11.6 -0.456 0.662
2 dep.var1 x1 2.11 0.268 7.87 0.000101
3 dep.var1 x2 0.538 1.23 0.437 0.675
4 dep.var2 (Intercept) 151. 57.9 2.61 0.0347
5 dep.var2 x1 -10.6 1.34 -7.87 0.000101
6 dep.var2 x2 -2.69 6.15 -0.437 0.675
7 dep.var3 (Intercept) -9.29 11.6 -0.802 0.449
8 dep.var3 x1 2.11 0.268 7.87 0.000101
9 dep.var3 x2 0.538 1.23 0.437 0.675
list output:
[[1]]
# A tibble: 3 x 6
group term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 dep.var1 (Intercept) -5.29 11.6 -0.456 0.662
2 dep.var1 x1 2.11 0.268 7.87 0.000101
3 dep.var1 x2 0.538 1.23 0.437 0.675
[[2]]
# A tibble: 3 x 6
group term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 dep.var2 (Intercept) 151. 57.9 2.61 0.0347
2 dep.var2 x1 -10.6 1.34 -7.87 0.000101
3 dep.var2 x2 -2.69 6.15 -0.437 0.675
[[3]]
# A tibble: 3 x 6
group term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 dep.var3 (Intercept) -9.29 11.6 -0.802 0.449
2 dep.var3 x1 2.11 0.268 7.87 0.000101
3 dep.var3 x2 0.538 1.23 0.437 0.675
glance output:
group r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 dep.var1 0.927 0.906 2.32 44.3 0.000106 2 -20.8 49.7 50.9 37.8 7 10
2 dep.var2 0.927 0.906 11.6 44.3 0.000106 2 -36.9 81.9 83.1 944. 7 10
3 dep.var3 0.927 0.906 2.32 44.3 0.000106 2 -20.8 49.7 50.9 37.8 7 10