Home > Blockchain >  Column by column correlation & significance between two data sets
Column by column correlation & significance between two data sets

Time:11-23

I would like to use the corr.test function from the psych package in order to calculate the correlation and the significance between corresponding columns of two dataframes. A simplified example of the dataframes Df1 and Df2 I am working with is this:

set.seed(42)
Df1 <- data.frame(matrix(runif(50), 10, 5))
Df2 <- data.frame(matrix(runif(50), 10, 5))

Please note that this question has been already answered here:

Column by column correlation between two data sets with R?

but only for the the correlation part, i.e., it lacks the significance I am looking for, since it uses the cor function and not the corr.test one.

Any help would be greatly appreciated.

CodePudding user response:

Maybe something like this. Map out the correlations and the p values to a dataframe:

library(tidyverse)

map_dfr(1:ncol(Df1), \(i) {
  cr_tst <- cor.test(Df1[, i], Df2[, i])
  tibble(var = colnames(Df1)[i],
         cor = cr_tst$estimate,
         p.value = cr_tst$p.value)
})
#> # A tibble: 5 x 3
#>   var       cor p.value
#>   <chr>   <dbl>   <dbl>
#> 1 X1     0.249    0.488
#> 2 X2    -0.408    0.242
#> 3 X3     0.0372   0.919
#> 4 X4    -0.0997   0.784
#> 5 X5     0.466    0.174

CodePudding user response:

The rcorr() function (from the Hmisc package) allows to compute p-values of the correlation test for several pairs of variables at once. Applied to our dataset, we have:

library("Hmisc")
res2 <- rcorr(as.matrix(cbind(Df1, Df2)))
res2
P
   X1     X2     X3     X4     X5     X1     X2     X3     X4     X5    
X1        0.8552 0.3306 0.2765 0.6174 0.4885 0.8445 0.3510 0.4739 0.8592
X2 0.8552        0.4264 0.9639 0.7081 0.2472 0.2417 0.7335 0.9291 0.1414
X3 0.3306 0.4264        0.6919 0.7151 0.4481 0.6139 0.9188 0.9900 0.7766
X4 0.2765 0.9639 0.6919        0.5230 0.4341 0.8599 0.2467 0.7841 0.9047
X5 0.6174 0.7081 0.7151 0.5230        0.1280 0.1076 0.1151 0.8081 0.1744
X1 0.4885 0.2472 0.4481 0.4341 0.1280        0.0130 0.5283 0.6915 0.9308
X2 0.8445 0.2417 0.6139 0.8599 0.1076 0.0130        0.8044 0.7331 0.4809
X3 0.3510 0.7335 0.9188 0.2467 0.1151 0.5283 0.8044        0.8020 0.2286
X4 0.4739 0.9291 0.9900 0.7841 0.8081 0.6915 0.7331 0.8020        0.0595
X5 0.8592 0.1414 0.7766 0.9047 0.1744 0.9308 0.4809 0.2286 0.0595

CodePudding user response:

Using cor.test in mapply and subsetting desired statistics, where estimate is the correlation.

mapply(\(x, y) cor.test(x, y)[c('estimate', 'p.value')], Df1, Df2)
#                 X1        X2        X3         X4          X5       
# estimate 0.2486405 -0.408098 0.03718413 -0.09967868 0.4662738
# p.value  0.4884952 0.2416943 0.9187721  0.7841065   0.1743502
  • Related