Home > Software engineering >  Count all values in a correlation matrix that are above 0.8 and below -0.8
Count all values in a correlation matrix that are above 0.8 and below -0.8

Time:08-31

I have a matrix of 2134 by 2134 of correlation values and I would like to count the total number of values that are above 0.8 and below -0.8. I have tried

length(TFcoTF[TFcoTF>.8])

but this does not seem to be correct as I am getting about 50 percent of values above .8 which does not correspond to the histogram I have for the data. Also when I do

length(TFcoTF[TFcoTF<-.8])

I got 0 as the output. Any help is appreciated.

CodePudding user response:

The data table package has a function called between. This returns TRUE/FALSE value for each value in your matrix whether the value is between two values.

In my example below, I randomly created a 10x10 matrix with random values [-1, 1]. Using the length function and subsetting where the values are in your range of [-0.8, 0.8].

library(data.table)

data <- matrix(runif(100,-1,1), nrow = 10, ncol=10)

data
             [,1]       [,2]       [,3]         [,4]       [,5]         [,6]       [,7]        [,8]       [,9]      [,10]
 [1,]  0.05585901 -0.7497720 -0.8371569 -0.401079424 -0.4130752 -0.788961736  0.2909987  0.48965177  0.4076504 -0.0682856
 [2,] -0.42442920  0.7476111  0.8238973 -0.912507391 -0.4450897 -0.001308901  0.5151425 -0.16838841 -0.1648151  0.8370660
 [3,] -0.73295874  0.5271986  0.5822628 -0.008554908 -0.2785803 -0.499058508 -0.5661172  0.35957967  0.5807055  0.2350893
 [4,]  0.18949338  0.3827603 -0.6112584  0.209209240 -0.5883962 -0.087900052  0.1272227  0.58165922 -0.9950324 -0.9118599
 [5,]  0.40862973  0.9496163  0.4996253  0.079538601  0.9839763 -0.119883751  0.3667418 -0.02751815 -0.6724141  0.3217434
 [6,]  0.77338548 -0.7698167 -0.5632436  0.223301216 -0.9936610  0.650110638 -0.9400395 -0.47808065 -0.1579283 -0.6896787
 [7,]  0.93210326  0.5360980  0.7677325  0.815231731 -0.4320206  0.647954028  0.5180600 -0.09574138 -0.3848389  0.9726445
 [8,] -0.66411834  0.1125759 -0.4021577 -0.711363103  0.7161801 -0.071971464  0.7953436  0.40326575  0.6895480  0.7496597
 [9,]  0.14118154  0.4775983  0.8966069  0.852880293  0.4715885 -0.542526148  0.5200246 -0.62649677 -0.3677738  0.1961003
[10,] -0.59353193 -0.2358892  0.5769562 -0.287113142 -0.7100862 -0.107092848 -0.8101459 -0.46754146 -0.4082147 -0.4475972

length(data[between(data,-0.8,0.8)])
[1] 84

CodePudding user response:

It's difficult to answer without having your dataset, please provide a minimal reproducible example later.

For the first line of code, this looks correct.

For the second, the error comes from a syntax error. In R you can assign value with = and <-. So x<-1 assign the value whereas x < -1 return a boolean.

You can then combine logical values and run the code below :

set.seed(42)
m <- matrix(runif(25, min = -1, max = 1), nrow = 5, ncol = 5)
m

length(m[ m > .8])   length(m[ m < -.8]) # long version from what you did.
length(m[ m < -.8 | m > .8]) # | mean or. TRUE | FALSE will return TRUE.

sum(m > .8 | m < -.8) 
# The sum of logical is the length, since sum(c(TRUE, FALSE)) is sum(c(0, 1))

sum(abs(m) > .8) # is the shortest version
  •  Tags:  
  • r
  • Related