I have a data sheet with 40 data columns (40 different nutrients), with additional columns for plot numbers and factors. I would like to automatically loop through each column name and produce a linear model and summary for each. The data columns begin at column 10.
for(i in 10:ncol(df)) { # for-loop over columns
mod2<-aov(i~block tillage*residue Error(subblock),data=df)
summary(mod2)
}
This is currently producing the error Error in model.frame.default(formula = i ~ subblock, data = df, drop.unused.levels = TRUE) : variable lengths differ (found for 'subblock')
Variable lengths are consistent so I imagine I am looping incorrectly.
The data looks similar to below (with more categorical columns at the start), with the nutrient columns beginning at column 10.
block | tillage | residue | subblock | nutrient 1 | nutrient 2 | etc. |
---|---|---|---|---|---|---|
b1 | NT | NR | s1 | 0.5 | 0.6 |
CodePudding user response:
In general it is helpful to post a sample of your data using dput()
. In the absence of that I am going to use the built in dataset mtcars
to show you how it is possible to do what you are doing with formula()
:
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# Select columns
desired_columns <- names(mtcars)[!names(mtcars)=="mpg"]
for (column in desired_columns){
this_formula = formula(paste("mpg ~ ", column))
print(summary(lm(this_formula, data = mtcars)))
}
This will output lm(mpg ~ var)
for each var
in the data. The key is the paste()
statement which builds the expression into a string, and then formula()
makes it into a formula object Hopefully you can see how this can be applied to your data.
CodePudding user response:
Here a simple base solution:
model <- list()
model_summary <- list()
for(i in 10:ncol(df)) { # for-loop over columns
col <- colnames(df)[i]
formula <- as.formula(paste0(col,"~block tillage*residue Error(subblock)"))
model[[i-9]] <-aov(formula,data=df)
model_summary [[i-9]]<-summary(model[[i-9]])
}
Just create a new formula at each iteration using the name of the i-column
CodePudding user response:
You do not need a loop. You can just pass a matrix to the LHS of the formula:
dep <- names(iris)[names(iris) != "Species"]
f <- as.formula(sprintf("cbind(%s) ~ Species", paste(dep, collapse = ",")))
summary(lm(f, data = iris))
CodePudding user response:
Purrr
solution:
Without a MWE it is difficult to help you. My approach would be to split your dataset into one dependent and one independent variable dataset. Then put each dependent variable into a list and append the independent dataset. Then you can "loop" through each list and apply the regression you like.
df <- mtcars
df_independent <- df %>%
as_tibble() %>%
# select independent variables
select(9:10)
df_dependent <- df %>%
as_tibble() %>%
# select all dependent variables and store each column in a list
select(1:8) %>%
as.list() %>%
map(as_tibble) %>%
map(~ cbind(.x, df_independent))
df_dependent %>%
# df_independent %>% colnames() %>% paste0(".x$",., collapse =" ")
map(~ lm(.x$value ~ .x$am .x$gear)) %>%
map(summary)
CodePudding user response:
If you want the statistics in a table (which might come in handy) you can use the purrr
and broom
packages. Here's an example using the dataset mtcars
:
Code
library(tidyr)
library(purrr)
library(broom)
formula <- lapply(colnames(mtcars)[3:ncol(mtcars)], function(x) as.formula(paste0(x, " ~ cyl")))
names(formula) <- format(formula)
table <- formula %>% map(~aov(.x, mtcars)) %>% map_dfr(tidy, .id="model")
Output
> head(table)
# A tibble: 6 x 7
model term df sumsq meansq statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 disp ~ cyl cyl 1 387454. 387454. 131. 1.80e-12
2 disp ~ cyl Residuals 30 88731. 2958. NA NA
3 hp ~ cyl cyl 1 100984. 100984. 67.7 3.48e- 9
4 hp ~ cyl Residuals 30 44743. 1491. NA NA
5 drat ~ cyl cyl 1 4.34 4.34 28.8 8.24e- 6
6 drat ~ cyl Residuals 30 4.52 0.151 NA NA
Try
formula <- lapply(colnames(df)[10:ncol(df)], function(x) as.formula(paste0(x, " ~ block tillage * residue Error(subblock)")))
names(formula) <- format(formula)
table <- formula %>% map(~aov(.x, df)) %>% map_dfr(tidy, .id="model")