I have 2 different group of matrices where the first group has data of 95 subjects with the name FC_sub-*.xlsx
and the second group also with data of 95 subjects with the name SC_sub-*.xlsx
. I need to do correlation between similar subjects in these 2 different groups. I know to do them individually. For eg:
SC_FC_sub_0152 <- corr.test(SC_sub_0152,FC_sub_0152, method="spearman", adjust="none")
SC_FC_sub_0225 <- corr.test(SC_sub_0225,FC_sub_0225, method="spearman", adjust="none")
but I couldn't write a code to loop through 2 different groups performing correlation on similar subjects. Is there any way to do this in R studio?
Any help would be appreciated, Thank you!
CodePudding user response:
You specified that there were exactly 95 in each. This answer is based on the assumption that matches are in the same row of the different files. (For example, let's say SC_sub_0152
is in row 10 in the data frame df1
. Then FC_sub_0152
is in row 10 in the data frame df2. If that's not the case, let me know.
First I created sample data, I thought I would include this so you could see how the data is arranged.
# create enough data for 100 values for each of the 95 subjects
# for each of the two "SC" and "FC"
set.seed(3525) # make rnorm repeatable
df1 = rnorm(95*100, 60, 5) %>%
matrix(ncol = 95, nrow = 100) %>%
as.data.frame()
df2 = rnorm(95*100, 98, 7) %>%
matrix(ncol = 95, nrow = 100) %>%
as.data.frame()
# rename with subjects' names
names(df1) <- paste0("SC_sub_",1001:1095)
names(df2) <- paste0("FC_sub_",1001:1095)
head(df1[ ,1:5])
# SC_sub_1001 SC_sub_1002 SC_sub_1003 SC_sub_1004 SC_sub_1005
# 1 54.97210 55.71496 56.77082 61.78804 58.47530
# 2 59.56508 62.20298 64.72606 58.04266 64.10494
# 3 56.59274 59.87084 50.21309 57.48015 50.34556
# 4 56.91834 61.88379 59.12483 54.84310 66.01470
# 5 56.25455 51.98541 67.23616 57.86956 62.93199
# 6 62.91731 47.86165 66.02651 58.31986 59.51732
head(df2[ ,1:5])
# FC_sub_1001 FC_sub_1002 FC_sub_1003 FC_sub_1004 FC_sub_1005
# 1 91.35180 93.69772 109.92090 81.96129 97.38721
# 2 87.34593 94.23049 95.68794 96.63895 102.92409
# 3 98.54663 91.52573 98.23197 107.08319 95.23934
# 4 95.99381 102.91114 92.83983 103.88144 103.91662
# 5 97.29054 81.85647 118.66778 108.90409 110.02502
# 6 95.07343 89.82221 97.14673 104.53310 92.81907
Then I mapped this data to collect Spearman's rank correlation coefficient. This creates a data frame with the name of the two subjects, the statistic, p value, and the rho value.
# if they are 1:1 in order already this works:
results = map_dfr(1:95,
.f = function(x){
test = cor.test(df1[, x],
df2[, x],
method = "spearman",
adjust = "none")
data.frame(subjects = paste0("SC_FC_sub_",
str_extract(names(df1)[x],
"\\d ")),
S = test$statistic,
p.value = test$p.value,
rho = test$estimate,
row.names = NULL)
})
head(results)
# subjects S p.value rho
# 1 SC_FC_sub_1001 141784 0.13826617 0.149210921
# 2 SC_FC_sub_1002 171962 0.75251341 -0.031875188
# 3 SC_FC_sub_1003 154588 0.47358280 0.072379238
# 4 SC_FC_sub_1004 203578 0.02690552 -0.221590159
# 5 SC_FC_sub_1005 167504 0.95961617 -0.005124512
# 6 SC_FC_sub_1006 183856 0.30613358 -0.103246325
CodePudding user response:
Consider Map
(wrapper to mapply
) after building lists (not separate data frames) of your subject data:
SC_lst <- list.files(path="/path/to/folder", pattern="SC_sub.*xlsx", full.names=TRUE)
SC_dfs <- lapply(SC_lst, readxl::read_excel)
FC_lst <- list.files(path="/path/to/myfolder", pattern="FC_sub.*xlsx", full.names=TRUE)
FC_dfs <- lapply(FC_lst, readxl::read_excel)
corr_list <- Map(
function(sc, fc) corr.test(sc, fc, method="spearman", adjust="none")
sc = SC_dfs,
fc = FC_dfs
)