I have a dataset with a rather large amount of variables. In the dataset I have a predictor and an outcome variable I want to investigate. I want to find covariates with either a significant effect on the outcome variable, or a significant interaction effect between the predictor and the covariate on the outcome variable.
It would therefore be convenient to be able to regress all the covariates in turn with the desired predictor on the dependent variable and create a table over the effects and interaction effects of the covariates with their respective p-values.
I want to do something like this:
library(dplyr)
# Generating sample data
set.seed(5)
df <- data.frame(matrix(round(abs(2*rnorm(100*100)), digits = 0), ncol=100))
# Selecting covariates
covar <- names(df)[! names(df) %in% c("X1", "X2")]
# Running the lm function over the list of covariates. I should get the covariate coefficients from each regression, but I get an error when I try run this step.
coeff <- lapply(covar, function(x){
# Retrive coefficient matrix
summary(lm(X1 ~ X2 x X2*x, df))$coefficients %>%
# Coerce into dataframe and filter for covariates and interaction effects
as.data.frame(.) %>%
filter(row.names(.) %in% grep(x, rownames(.), value =
TRUE))}) %>%
# Finally I want to join all data frames into one
bind_rows(.)
I could use some help with the syntax. I get the following error when I try to run the function:
Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': variable lengths differ (found for 'x')
CodePudding user response:
When you use x
(in lapply
) inside function
, it might be better using paste
for model formula instead of just specifying it's formula.
lapply(covar, function(x){
modd <- paste0("X1 ~ X2 ", x, " X2 *", x)
summary(lm(modd, df))$coefficients %>%
as.data.frame(.) %>%
filter(row.names(.) %in% grep(x, rownames(.), value =
TRUE))}) %>%
bind_rows(.)