I want to loop lm()
models for variable i
(response) with an explanatory variable in a list of dataframes that are split by factor. Lastly, I want to create two dataframes that will show the lm
coefficients: the first will show the slope
and the second the p.value
with response variables tested in the models as cols and factor levels in rows.
I managed to run and print the output of the summary
of the lm
models, but not sure how to create the appropriate slope
and p.value
dataframes.
Here is what I've done:
data (iris)
iris_split = split (iris,f=iris$Species) ### Split the data by factor "Species"
I want to run lm models for each of the following variables
(treated as responses for the sake of the question)
with Petal.Width
vars = as.vector (unique (colnames (subset (iris, select = -c(Species, Petal.Width )))))
#Output:
#> vars
#[1] "Sepal.Length" "Sepal.Width" "Petal.Length"
iris_lm = for (i in vars) { # loop across vars
lm_summary = lapply (iris_split, FUN = function(x)
summary(lm (x[,i] ~ x[,"Petal.Width"]))) #Where (x) is levels of factors "Species"
print(i) # so I could see which variable is tested in the model
print(lm_summary)
}
How do I create the slop.df
and p.val.df
?
They need to look like this:
#> slop.df
# Species Sepal.Length Sepal.Width Petal.Length
#1 setosa slope? slope? slope?
#2 versicolor slope? slope? slope?
#3 virginica slope? slope? slope?
The actual slopes need to be shown instead of the "slope?"
placeholder, and the same goes for p.val.df
CodePudding user response:
packages from the [tidyverse][1] make this fairly convenient:
iris %>%
pivot_longer(-c(Species, Petal.Width),
names_to = 'variable',
values_to = 'value'
) %>%
group_by(Species, variable) %>%
## mind to return the model results as a list!
summarise(model_summary = list(summary(lm(Petal.Width ~ value)))) %>%
rowwise %>%
mutate(slope = model_summary$coefficients[2, 'Estimate'],
## p = model_summary$coefficients[2, 'Pr(>|t|)']
) %>%
ungroup %>%
pivot_wider(id_cols = Species,
names_from = 'variable',
values_from = 'slope')