Home > Enterprise >  dataframe to correlation matrix
dataframe to correlation matrix

Time:08-01

I have a data frame in R (df) which looks like this:

colA colB
A,B   0.5
A,C   8
B,A   0.5
B,C   9
C,A   8
C,B   9

It represents correlation values obtained by running a certain software. Now, I would like to convert this data frame to a correlation matrix to be plotted with the Corr() function:

DESIRED OUTPUT:

   A    B    C 

A  1    0.5  8

B  0.5   1   9

C  8     9    1

Please, any suggestion about the code I can utilise?

CodePudding user response:

Data:

input <- structure(list(colA = c("A,B", "A,C", "B,A", "B,C", "C,A", "C,B"
), colB = c(0.5, 8, 0.5, 9, 8, 9)), class = "data.frame", row.names = c(NA, -6L))

Solution:

## separate that column "colA" into 2
rc <- read.csv(text = input$colA, header = FALSE)
#  V1 V2
#1  A  B
#2  A  C
#3  B  A
#4  B  C
#5  C  A
#6  C  B

tapply(input$colB, unname(rc), FUN = identity, default = 1)
#    A   B C
#A 1.0 0.5 8
#B 0.5 1.0 9
#C 8.0 9.0 1

Note 1: OP has carelessly made-up data. Correlation is never bigger than 1.

Note 2: Thanks thelatemail for suggesting simply using read.csv instead of scan matrix asplit, as was in my initial answer.


Remark 1: If using xtabs, we have to modify diagonal elements to 1 later.

Remark 2: Matrix indexing is also a good approach, but takes more lines of code.

Remark 3: "reshaping" solution is also a good idea.

rc$value <- input$colB
reshape2::acast(rc, V1 ~ V2, fill = 1)
#    A   B C
#A 1.0 0.5 8
#B 0.5 1.0 9
#C 8.0 9.0 1

CodePudding user response:

Something like that?

# create your input df:
df<-data.frame(colA=c("A,B","A,C","B,A","B,C","C,A","C,B"),value=c(0.5,8,0.5,9,8,9))

# split ID column
df[,c("col.A","col.B")]<- matrix(ncol=2,unlist(strsplit(df$colA,",")),byrow = T)
# reshape
library(reshape2)
dcast( df , col.A~col.B ,fill=1)
  • Related