I would like to use the corr.test function from the psych package in order to calculate the correlation and the significance between corresponding columns of two dataframes.
A simplified example of the dataframes Df1
and Df2
I am working with is this:
set.seed(42)
Df1 <- data.frame(matrix(runif(50), 10, 5))
Df2 <- data.frame(matrix(runif(50), 10, 5))
Please note that this question has been already answered here:
Column by column correlation between two data sets with R?
but only for the the correlation part, i.e., it lacks the significance I am looking for, since it uses the cor function and not the corr.test one.
Any help would be greatly appreciated.
CodePudding user response:
Maybe something like this. Map out the correlations and the p values to a dataframe:
library(tidyverse)
map_dfr(1:ncol(Df1), \(i) {
cr_tst <- cor.test(Df1[, i], Df2[, i])
tibble(var = colnames(Df1)[i],
cor = cr_tst$estimate,
p.value = cr_tst$p.value)
})
#> # A tibble: 5 x 3
#> var cor p.value
#> <chr> <dbl> <dbl>
#> 1 X1 0.249 0.488
#> 2 X2 -0.408 0.242
#> 3 X3 0.0372 0.919
#> 4 X4 -0.0997 0.784
#> 5 X5 0.466 0.174
CodePudding user response:
The rcorr()
function (from the Hmisc
package) allows to compute p-values of the correlation test for several pairs of variables at once. Applied to our dataset, we have:
library("Hmisc")
res2 <- rcorr(as.matrix(cbind(Df1, Df2)))
res2
P
X1 X2 X3 X4 X5 X1 X2 X3 X4 X5
X1 0.8552 0.3306 0.2765 0.6174 0.4885 0.8445 0.3510 0.4739 0.8592
X2 0.8552 0.4264 0.9639 0.7081 0.2472 0.2417 0.7335 0.9291 0.1414
X3 0.3306 0.4264 0.6919 0.7151 0.4481 0.6139 0.9188 0.9900 0.7766
X4 0.2765 0.9639 0.6919 0.5230 0.4341 0.8599 0.2467 0.7841 0.9047
X5 0.6174 0.7081 0.7151 0.5230 0.1280 0.1076 0.1151 0.8081 0.1744
X1 0.4885 0.2472 0.4481 0.4341 0.1280 0.0130 0.5283 0.6915 0.9308
X2 0.8445 0.2417 0.6139 0.8599 0.1076 0.0130 0.8044 0.7331 0.4809
X3 0.3510 0.7335 0.9188 0.2467 0.1151 0.5283 0.8044 0.8020 0.2286
X4 0.4739 0.9291 0.9900 0.7841 0.8081 0.6915 0.7331 0.8020 0.0595
X5 0.8592 0.1414 0.7766 0.9047 0.1744 0.9308 0.4809 0.2286 0.0595
CodePudding user response:
Using cor.test
in mapply
and subsetting desired statistics, where estimate is the correlation.
mapply(\(x, y) cor.test(x, y)[c('estimate', 'p.value')], Df1, Df2)
# X1 X2 X3 X4 X5
# estimate 0.2486405 -0.408098 0.03718413 -0.09967868 0.4662738
# p.value 0.4884952 0.2416943 0.9187721 0.7841065 0.1743502