I have a data frame as such:
Feature ID Sub Value
A T1 B1 5.87
B T1 B2 3.99
C T1 B3 12.57
A T1 B2 9.22
B T1 B3 7.89
C T1 B1 4.76
A T2 B1 4.56
B T2 B2 9.26
C T2 B2 7.44
What I want to do is run one factor ANOVA in this dataset with the factor being "Sub". I want to loop through each feature and loop through each ID. Basically, I am calculating the variance of each feature within an ID, between "Sub".
I have generated the below code, but it doesn't seem to be working.
datalist = list()
for (i in unique(data1$Feature)) {
for (j in unique(data1$ID)) {
A1 <- summary(aov(data1$value ~ as.factor(data1$Sub), data = data1))
datalist[[j]] <- A1
}
}
big_data = do.call(rbind, datalist)
I end up getting big_data which is a matrix of 36 lists. I am unable to access the Anova output. It doesn't have to necessarily be a data frame. Even if it's a "write.csv()" within the loop that will generate the different outputs. Ultimately, I'll just be needing the "between" factor parameter of the Anova output to generate a plot so if this can also be incorporated in the code that'd be of great help.
I am still a beginner in R any help is very much appreciated.
Thank you!
CodePudding user response:
Several issues with current setup:
You do not actually use
i
andj
in youranova
call, so all nestedfor
loop iterations will return exact same results run on entire data frame. Quick Fix:subset
data frame by i-th and j-th values.anova(value ~ Sub, data = subset(data1, Feature == i & ID == j))
You save list elements only under
j
values and not bothi
andj
, so iterations will reassign repeatedly and only saves last pass ofj
items. Quick fix: add named elements of i-th and j-th values.datalist[[paste0(i, "_", j)]] <- A1
You are attempting to
rbind
list objects, not matrices or data frames, sincesummary.anova
returns a list of results. For your use case, callingstr
shows your results contain a list of 1:str(summary(aov(data1$value ~ as.factor(data1$Sub), data = data1))) List of 1 $ :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : num [1:2] ... ..$ Sum Sq : num [1:2] ... ..$ Mean Sq: num [1:2] ... ..$ F value: num [1:2] ... ..$ Pr(>F) : num [1:2] ... - attr(*, "class")= chr [1:2] "summary.aov" "listof"
Quick fix: index the first item.
summary(anova(...))[[1]]
However, consider an apply family solution with by
(object-oriented wrapper to tapply
) and avoid the bookkeeping of initializing lists and assign iteratively in nested for
loops. Specifically, by
can split up data frame by one or more groups and run operations on the subsets to return a list equal to all possible unique values of groups. Also, consider using a defined method to encapsulate all processing on each subset.
# USER-DEFINED METHOD
run_anova <- function(sub_df) {
# RAW RESULTS
anova_raw <- summary(aov(value ~ Sub, data = sub_df))[[1]]
# CLEAN UP DATA WITH IDENTIFIERS
anova_df <- data.frame(
within(anova_raw, {Feature <- sub_df$Feature[1]; ID <- sub_df$ID[1]}),
row.names = NULL,
check.names = FALSE
)
return(anova_df)
}
datalist <- by(data1, data1[c("Feature", "ID")], run_anova)
big_data <- do.call(rbind, unname(datalist))