Home > Mobile >  How to avoid comparing diagonal lines in correlation matrix
How to avoid comparing diagonal lines in correlation matrix

Time:12-04

I have a correlation matrix and I want to compare the values of the correlation of my variables to group the variables that have correlation higher than a specefic value. I'm doing that with for loop and I want to know how to avoid comparing diagonal values (where the value equal to 1)

Here's example of my correlation matrix

Data >
     A     B    C      D     E     F      G
A    1    0.2   0.7   0.41  0.89  0.63  0.22
B    0.2   1    0.12  0.5   0.7   0.74  0.3
C    0.7  0.12  1     0.65  0.23  0.88  0.19
D    0.41 0.5   0.65   1    0.33  0.57  0.9
E    0.89 0.7   0.23  0.33  1     0.20  0.94
F    0.63 0.74  0.88  0.57  0.20   1    0.86
G    0.22 0.3   0.19  0.9   0.94  0.86   1

Here's simple version of the code I used

for (ii in 1:(ncol(Data)-1)) {
  for(jj in 1:(ncol(Data))){
    if (abs(Data[1,ii] - Data[1,jj]) <= 0.8) {
      print("True")
      print(paste("Le nom de variable est ",colnames(Data)[jj]))
      
    }
    else{
      print("false")
      print(paste("Le nom de variable est ",colnames(Data)[ii]))
      
    }
    
  }
}

But it will compare variable with it self (when ii = jj) and return the result of diagonal value which is equal to 1.

So my question: how can I modify my code to not compare the diagonal line?

Thank you

CodePudding user response:

If you want to use your own code, try this

for (ii in 1:(ncol(Data)-1)) {
  for(jj in 1:(ncol(Data))){
    if (ii != jj){
      if (abs(Data[1,ii] - Data[1,jj]) <= 0.8) {
        print("True")
        print(paste("Le nom de variable est ",colnames(Data)[jj]))
      
      }
      else{
        print("false")
        print(paste("Le nom de variable est ",colnames(Data)[ii]))
      
      }
    }    
  }
}

CodePudding user response:

While possible, a for-loop is probably not the most efficient solution here. Consider an alternative, using upper.tri and which function that returns the positions in the matrix that satisfy a condition:

x <- c(1, .2, .7, .41, .89, .63, .22, .2, 1, .12,.5,.7,.74,.3,.7,.12,1,.65,.23,.88,.19,.41,.5,.65,.1,.33,.57,.9,.89,.7,.23,.33,1,.2,.94,.63,.74,.88,.57,.2,1,.86,.22,.3,.19,.9,.94,.86,1)
m <- matrix(x, ncol = 7)

m[upper.tri(m, diag = T)] <- NA
locs <- which(m >= .8, arr.ind = T)
locs
     row col
[1,]   5   1
[2,]   6   3
[3,]   7   4
[4,]   7   5
[5,]   7   6

# or if you want the names of matches
f <- Vectorize(function(x){
  switch(as.character(x),
         "1" = "A", "2" = "B", "3" = "C", "4" = "D",
         "5" = "E", "6" = "F", "7" = "G", NA)
})
apply(locs, 2, f)
  • Related