I have a correlation matrix and I want to compare the values of the correlation of my variables to group the variables that have correlation higher than a specefic value. I'm doing that with for
loop and I want to know how to avoid comparing diagonal values (where the value equal to 1)
Here's example of my correlation matrix
Data >
A B C D E F G
A 1 0.2 0.7 0.41 0.89 0.63 0.22
B 0.2 1 0.12 0.5 0.7 0.74 0.3
C 0.7 0.12 1 0.65 0.23 0.88 0.19
D 0.41 0.5 0.65 1 0.33 0.57 0.9
E 0.89 0.7 0.23 0.33 1 0.20 0.94
F 0.63 0.74 0.88 0.57 0.20 1 0.86
G 0.22 0.3 0.19 0.9 0.94 0.86 1
Here's simple version of the code I used
for (ii in 1:(ncol(Data)-1)) {
for(jj in 1:(ncol(Data))){
if (abs(Data[1,ii] - Data[1,jj]) <= 0.8) {
print("True")
print(paste("Le nom de variable est ",colnames(Data)[jj]))
}
else{
print("false")
print(paste("Le nom de variable est ",colnames(Data)[ii]))
}
}
}
But it will compare variable with it self (when ii = jj
) and return the result of diagonal value which is equal to 1.
So my question: how can I modify my code to not compare the diagonal line?
Thank you
CodePudding user response:
If you want to use your own code, try this
for (ii in 1:(ncol(Data)-1)) {
for(jj in 1:(ncol(Data))){
if (ii != jj){
if (abs(Data[1,ii] - Data[1,jj]) <= 0.8) {
print("True")
print(paste("Le nom de variable est ",colnames(Data)[jj]))
}
else{
print("false")
print(paste("Le nom de variable est ",colnames(Data)[ii]))
}
}
}
}
CodePudding user response:
While possible, a for-loop
is probably not the most efficient solution here. Consider an alternative, using upper.tri
and which
function that returns the positions in the matrix that satisfy a condition:
x <- c(1, .2, .7, .41, .89, .63, .22, .2, 1, .12,.5,.7,.74,.3,.7,.12,1,.65,.23,.88,.19,.41,.5,.65,.1,.33,.57,.9,.89,.7,.23,.33,1,.2,.94,.63,.74,.88,.57,.2,1,.86,.22,.3,.19,.9,.94,.86,1)
m <- matrix(x, ncol = 7)
m[upper.tri(m, diag = T)] <- NA
locs <- which(m >= .8, arr.ind = T)
locs
row col
[1,] 5 1
[2,] 6 3
[3,] 7 4
[4,] 7 5
[5,] 7 6
# or if you want the names of matches
f <- Vectorize(function(x){
switch(as.character(x),
"1" = "A", "2" = "B", "3" = "C", "4" = "D",
"5" = "E", "6" = "F", "7" = "G", NA)
})
apply(locs, 2, f)