Home > OS >  Is there a way to loop this in R?
Is there a way to loop this in R?

Time:02-11

I have 2 different group of matrices where the first group has data of 95 subjects with the name FC_sub-*.xlsx and the second group also with data of 95 subjects with the name SC_sub-*.xlsx. I need to do correlation between similar subjects in these 2 different groups. I know to do them individually. For eg:

SC_FC_sub_0152 <- corr.test(SC_sub_0152,FC_sub_0152, method="spearman", adjust="none")
SC_FC_sub_0225 <- corr.test(SC_sub_0225,FC_sub_0225, method="spearman", adjust="none")

but I couldn't write a code to loop through 2 different groups performing correlation on similar subjects. Is there any way to do this in R studio?

Any help would be appreciated, Thank you!

CodePudding user response:

You specified that there were exactly 95 in each. This answer is based on the assumption that matches are in the same row of the different files. (For example, let's say SC_sub_0152 is in row 10 in the data frame df1. Then FC_sub_0152 is in row 10 in the data frame df2. If that's not the case, let me know.

First I created sample data, I thought I would include this so you could see how the data is arranged.

# create enough data for 100 values for each of the 95 subjects
#   for each of the two "SC" and "FC"
set.seed(3525) # make rnorm repeatable
df1 = rnorm(95*100, 60, 5) %>% 
  matrix(ncol = 95, nrow = 100) %>% 
  as.data.frame()
df2 = rnorm(95*100, 98, 7) %>% 
  matrix(ncol = 95, nrow = 100) %>% 
  as.data.frame()

# rename with subjects' names
names(df1) <- paste0("SC_sub_",1001:1095)

names(df2) <- paste0("FC_sub_",1001:1095)


head(df1[ ,1:5])
#   SC_sub_1001 SC_sub_1002 SC_sub_1003 SC_sub_1004 SC_sub_1005
# 1    54.97210    55.71496    56.77082    61.78804    58.47530
# 2    59.56508    62.20298    64.72606    58.04266    64.10494
# 3    56.59274    59.87084    50.21309    57.48015    50.34556
# 4    56.91834    61.88379    59.12483    54.84310    66.01470
# 5    56.25455    51.98541    67.23616    57.86956    62.93199
# 6    62.91731    47.86165    66.02651    58.31986    59.51732 
head(df2[ ,1:5])
#   FC_sub_1001 FC_sub_1002 FC_sub_1003 FC_sub_1004 FC_sub_1005
# 1    91.35180    93.69772   109.92090    81.96129    97.38721
# 2    87.34593    94.23049    95.68794    96.63895   102.92409
# 3    98.54663    91.52573    98.23197   107.08319    95.23934
# 4    95.99381   102.91114    92.83983   103.88144   103.91662
# 5    97.29054    81.85647   118.66778   108.90409   110.02502
# 6    95.07343    89.82221    97.14673   104.53310    92.81907 

Then I mapped this data to collect Spearman's rank correlation coefficient. This creates a data frame with the name of the two subjects, the statistic, p value, and the rho value.

# if they are 1:1 in order already this works:
results = map_dfr(1:95, 
                  .f = function(x){
                  test = cor.test(df1[, x],
                                  df2[, x],
                                  method = "spearman",
                                  adjust = "none")
                  data.frame(subjects = paste0("SC_FC_sub_", 
                                               str_extract(names(df1)[x],
                                                           "\\d ")),
                             S = test$statistic,
                             p.value = test$p.value,
                             rho = test$estimate,
                             row.names = NULL)
                  })

head(results)
#         subjects      S    p.value          rho
# 1 SC_FC_sub_1001 141784 0.13826617  0.149210921
# 2 SC_FC_sub_1002 171962 0.75251341 -0.031875188
# 3 SC_FC_sub_1003 154588 0.47358280  0.072379238
# 4 SC_FC_sub_1004 203578 0.02690552 -0.221590159
# 5 SC_FC_sub_1005 167504 0.95961617 -0.005124512
# 6 SC_FC_sub_1006 183856 0.30613358 -0.103246325 

CodePudding user response:

Consider Map (wrapper to mapply) after building lists (not separate data frames) of your subject data:

SC_lst <- list.files(path="/path/to/folder", pattern="SC_sub.*xlsx", full.names=TRUE)
SC_dfs <- lapply(SC_lst, readxl::read_excel) 

FC_lst <- list.files(path="/path/to/myfolder", pattern="FC_sub.*xlsx", full.names=TRUE)
FC_dfs <- lapply(FC_lst, readxl::read_excel) 

corr_list <- Map(
   function(sc, fc) corr.test(sc, fc, method="spearman", adjust="none")
   sc = SC_dfs, 
   fc = FC_dfs
)
  • Related