Home > front end >  R find correlations between dataframes of different sizes
R find correlations between dataframes of different sizes

Time:04-24

I have two data frames with the same number of columns but different numbers of rows. I am trying to run correlation tests for all of the rows. I am able to do this with a for loop but because of the size of the dataset it is not a feasible option. I have found solutions for finding correlations of equal size data frames but I am not sure how to adapt these solutions.

Here is my for loop solution that works for smaller datasets.

c.mg.spearmanB = data.frame()
for (i in 1:nrow(brainMicroRNAs)) {
  for (j in 1:nrow(brainGenes)) {
    miRNA = brainMicroRNAs[i,]
    gene = brainGenes[j,]
    #calculate correlations and add to dataframe
    c.mg.spearmanB[i,j] = cor.test(miRNA, gene, method="spearman", exact=F)$p.value
  }
}

CodePudding user response:

Using loops is not recommeded. Use df.corr of pandas instead. Merge the two dfs and then run the corr. If possible provide samples of datasets.

CodePudding user response:

R expects observations to be rows, and variables to be columns. You appear to have flipped this around. No worries, we can transpose and use the normal cor function.

Some example data:

set.seed(1234)
df1 <- as.data.frame(matrix(rnorm(12), ncol = 3))
df2 <- as.data.frame(matrix(rnorm(15), ncol = 3))

Now calculate the correlations:

cors <- t(cor(t(df1), t(df2)))

If you need p-values, we can do that manually, using vectorized functions:

df <- ncol(df1) - 2
t_vals <- cors * sqrt(df) / sqrt(1 - cors ^ 2)
p_vals <- 2 * min(pt(t_vals, df), pt(t_vals, df, lower.tail = FALSE))

Those p-values are two-sided.

And to check, if it all worked:

cor.test(unlist(df1[3, ]), unlist(df2[3, ]))
# Pearson's product-moment correlation
# 
# data:  unlist(df1[3, ]) and unlist(df2[3, ])
# t = -0.015874, df = 1, p-value = 0.9899
# alternative hypothesis: true correlation is not equal to 0
# sample estimates:
#         cor 
# -0.01587196 
cors[3, 3]
# [1] -0.01587196
p_vals[3, 3]
# [1] 0.9898952
  • Related