I have a data frame in R (df) which looks like this:
colA colB
A,B 0.5
A,C 8
B,A 0.5
B,C 9
C,A 8
C,B 9
It represents correlation values obtained by running a certain software. Now, I would like to convert this data frame to a correlation matrix to be plotted with the Corr() function:
DESIRED OUTPUT:
A B C
A 1 0.5 8
B 0.5 1 9
C 8 9 1
Please, any suggestion about the code I can utilise?
CodePudding user response:
Data:
input <- structure(list(colA = c("A,B", "A,C", "B,A", "B,C", "C,A", "C,B"
), colB = c(0.5, 8, 0.5, 9, 8, 9)), class = "data.frame", row.names = c(NA, -6L))
Solution:
## separate that column "colA" into 2
rc <- read.csv(text = input$colA, header = FALSE)
# V1 V2
#1 A B
#2 A C
#3 B A
#4 B C
#5 C A
#6 C B
tapply(input$colB, unname(rc), FUN = identity, default = 1)
# A B C
#A 1.0 0.5 8
#B 0.5 1.0 9
#C 8.0 9.0 1
Note 1: OP has carelessly made-up data. Correlation is never bigger than 1.
Note 2: Thanks thelatemail for suggesting simply using read.csv
instead of scan
matrix
asplit
, as was in my initial answer.
Remark 1: If using xtabs
, we have to modify diagonal elements to 1 later.
Remark 2: Matrix indexing is also a good approach, but takes more lines of code.
Remark 3: "reshaping" solution is also a good idea.
rc$value <- input$colB
reshape2::acast(rc, V1 ~ V2, fill = 1)
# A B C
#A 1.0 0.5 8
#B 0.5 1.0 9
#C 8.0 9.0 1
CodePudding user response:
Something like that?
# create your input df:
df<-data.frame(colA=c("A,B","A,C","B,A","B,C","C,A","C,B"),value=c(0.5,8,0.5,9,8,9))
# split ID column
df[,c("col.A","col.B")]<- matrix(ncol=2,unlist(strsplit(df$colA,",")),byrow = T)
# reshape
library(reshape2)
dcast( df , col.A~col.B ,fill=1)