Home > Enterprise >  Extract certain values out of a correlation matrix
Extract certain values out of a correlation matrix

Time:11-03

Is there a way to distract the correlation coefficients out of a correlation matrix ?

Let's say I have a dataset with 3 variables (a, b, c) and I want to calculate the correlations among themselves.

with


df <- data.frame(a <- c(2, 3, 3, 5, 6, 9, 14, 15, 19, 21, 22, 23),
                 b <- c(23, 24, 24, 23, 17, 28, 38, 34, 35, 39, 41, 43),
                 c <- c(13, 14, 14, 14, 15, 17, 18, 19, 22, 20, 24, 26),
                 d <- c(6, 6, 7, 8, 8, 8, 7, 6, 5, 3, 3, 2))

and

cor(df[, c('a', 'b', 'c')])

I'll get a correlation matrix:

          a         b         c
 a 1.0000000 0.9279869 0.9604329
 b 0.9279869 1.0000000 0.8942139
 c 0.9604329 0.8942139 1.0000000

Is there a way to show the results in a manner like this:

  1. Correlation between a and b is: 0.9279869.
  2. Correlation between a and c is: 0.9604329.
  3. Correlation between b and c is: 0.8942139:

?

My correlation matrix is of obviously bigger (~300 entries) eand I need a way to distract only the values that are important for me.

Thanks.

CodePudding user response:

Using reshape2 and melt

df <- data.frame("a" = c(2, 3, 3, 5, 6, 9, 14, 15, 19, 21, 22, 23),
                 "b" = c(23, 24, 24, 23, 17, 28, 38, 34, 35, 39, 41, 43),
                 "c" = c(13, 14, 14, 14, 15, 17, 18, 19, 22, 20, 24, 26),
                 "d" = c(6, 6, 7, 8, 8, 8, 7, 6, 5, 3, 3, 2))

tmp=cor(df[, c('a', 'b', 'c')])
tmp[lower.tri(tmp)]=NA
diag(tmp)=NA

library(reshape2)
na.omit(melt(tmp))

resulting in

  Var1 Var2     value
4    a    b 0.9279869
7    a    c 0.9604329
8    b    c 0.8942139

CodePudding user response:

You can do,

df1 = cor(df[, c('a', 'b', 'c')])
df1 = as.data.frame(as.table(df1))
df1$Freq = round(df1$Freq,2)
df2 = subset(df1, (as.character(df1$Var1) != as.character(df1$Var2)))
df2$res = paste('Correlation between', df2$Var1, 'and', df2$Var2, 'is', df2$Freq)


 Var1 Var2 Freq                                 res
2    b    a 0.93 Correlation between b and a is 0.93
3    c    a 0.96 Correlation between c and a is 0.96
4    a    b 0.93 Correlation between a and b is 0.93
6    c    b 0.89 Correlation between c and b is 0.89
7    a    c 0.96 Correlation between a and c is 0.96
8    b    c 0.89 Correlation between b and c is 0.89

CodePudding user response:

Here is another idea with reshaping to long format, i.e.

tidyr::pivot_longer(tibble::rownames_to_column(as.data.frame(cor(df[, c('a', 'b', 'c')])), var = 'rn'), -1)

# A tibble: 9 x 3
  rn    name  value
  <chr> <chr> <dbl>
1 a     a     1    
2 a     b     0.928
3 a     c     0.960
4 b     a     0.928
5 b     b     1    
6 b     c     0.894
7 c     a     0.960
8 c     b     0.894
9 c     c     1    

CodePudding user response:

Maybe you can try as.table as.data.frame

> as.data.frame(as.table(cor(df[, c("a", "b", "c")])))
  Var1 Var2      Freq
1    a    a 1.0000000
2    b    a 0.9279869
3    c    a 0.9604329
4    a    b 0.9279869
5    b    b 1.0000000
6    c    b 0.8942139
7    a    c 0.9604329
8    b    c 0.8942139
9    c    c 1.0000000
  • Related