I have a big matrix in R with more than 2000 columns and 10,000 rows, and many missing values. This line of code calculates the correlation matrix in R.
cor(data, use = "complete.obs")
My question is: how can I find the number of observations that have been used to calculate each correlation in the output matrix?
The output should be something like this:
v1 | v2 | v3 | v4 | |
---|---|---|---|---|
v1 | 20 | 12 | 15 | 18 |
v2 | 12 | 15 | 10 | 11 |
v3 | 15 | 10 | 25 | 20 |
v4 | 18 | 11 | 20 | 20 |
Thanks for any suggestion
CodePudding user response:
Let's use a sample matrix data
filled with random NA
s:
library(dplyr)
set.seed(1234)
data <- rnorm(100) %>%
matrix(nrow = 10) %>%
{
m <- .
m[rnorm(100) > .5] <- NA
m
}
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.48522682 NA 0.8951720 -0.32439330 0.05913517 0.4369306
[2,] 0.69676878 -0.4002352 0.6602126 NA 0.41339889 NA
[3,] 0.18551392 1.4934931 2.2734835 -0.93350334 NA 0.4521904
[4,] NA -1.6070809 1.1734976 NA NA 0.6631986
[5,] 0.31168103 -0.4157518 0.2877097 0.31916024 0.71888873 -1.1363736
[6,] 0.76046236 NA -0.6597701 -1.07754212 NA NA
[7,] 1.84246363 -0.1517365 NA -3.23315213 1.35727444 NA
[8,] NA NA 0.6774155 NA 0.40446847 -1.2239038
[9,] 0.03266396 -0.3047211 NA 0.02951783 0.26436427 0.2580684
[10,] NA 0.6295361 0.1864921 0.59427377 0.26804390 NA
[,7] [,8] [,9] [,10]
[1,] NA -0.3046139 -1.0118219 NA
[2,] NA 1.8250111 0.4701675 0.1832475
[3,] 0.1586254 0.6705594 -0.7009703 -1.7662292
[4,] -1.7632551 0.9486326 NA NA
[5,] 0.3385960 2.0494030 NA NA
[6,] NA -0.6511136 NA NA
[7,] -0.2386466 0.8086193 NA -1.1750368
[8,] -1.1877653 0.9865806 -0.2457632 NA
[9,] 0.3849353 NA -1.5528590 0.3536254
[10,] NA 0.3190524 0.1284340 0.3191562
You can transform it into a logical matrix dna
where dna[i,j] == TRUE
means that data[i,j]
is not NA
:
dna <- !is.na(data)
Then you can perform matrix product of dna
with t(dna)
to obtain the number of non-missing observations.
dna <- !is.na(data)
dna %*% t(dna)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 8 7 4 6 4 3 4 8 5 7
[2,] 7 9 6 6 5 4 6 8 6 8
[3,] 4 6 6 4 4 3 4 5 4 5
[4,] 6 6 4 7 3 3 3 6 5 6
[5,] 4 5 4 3 5 2 4 5 3 4
[6,] 3 4 3 3 2 5 4 4 3 5
[7,] 4 6 4 3 4 4 6 5 4 6
[8,] 8 8 5 6 5 4 5 9 5 8
[9,] 5 6 4 5 3 3 4 5 6 5
[10,] 7 8 5 6 4 5 6 8 5 9