I have data that looks like this
Cluster.x Cluster.y n
1 4 8
1 7 1
2 1 2
2 4 4
2 7 1
And i want to transform it to a crosstable that looks like this
Cluster.y
Cluster.x 1 2 3 4 5 7
1 0 2 0 8 0 1
2 2 0 0 4 0 1
3 0 0 0 0 0 0
4 8 4 0 0 0 0
5 0 0 0 0 0 0
7 1 1 0 0 0 0
How can i do this transformation? It is important for me, that the table is symmetric and is readable as rows or columns with the same results.
CodePudding user response:
Firstly, create a matrix with zeros,
out <- matrix(0, 7,7)
Then assign your values into them by flipping columns,
out[as.matrix(df[,1:2])] <- df[,3]
out[as.matrix(df[,2:1])] <- df[,3]
gives,
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 0 2 0 8 0 0 1
# [2,] 2 0 0 4 0 0 1
# [3,] 0 0 0 0 0 0 0
# [4,] 8 4 0 0 0 0 0
# [5,] 0 0 0 0 0 0 0
# [6,] 0 0 0 0 0 0 0
# [7,] 1 1 0 0 0 0 0
Data:
df <- read.table(text="
Cluster.x Cluster.y n
1 4 8
1 7 1
2 1 2
2 4 4
2 7 1",header=T)
CodePudding user response:
library(Matrix)
dims <- max(dt$Cluster.x, dt$Cluster.y)
m <- sparseMatrix(i = dt$Cluster.x, j = dt$Cluster.y, x = dt$n, dims = c(dims, dims))
matrix(m, nrow = dims, ncol = dims, dimnames = list(1:dims, 1:dims))
# 1 2 3 4 5 6 7
# 1 0 0 0 8 0 0 1
# 2 2 0 0 4 0 0 1
# 3 0 0 0 0 0 0 0
# 4 0 0 0 0 0 0 0
# 5 0 0 0 0 0 0 0
# 6 0 0 0 0 0 0 0
# 7 0 0 0 0 0 0 0
Edit
You can get the symetric one rowbinding the reverse values and remove duplicate rows. I use data.table here.
dt <- unique(data.table::rbindlist(list(
dt, dt[, .(Cluster.x = Cluster.y, Cluster.y = Cluster.x, n = n)]
)))
This will result in:
1 2 3 4 5 6 7
1 0 2 0 8 0 0 1
2 2 0 0 4 0 0 1
3 0 0 0 0 0 0 0
4 8 4 0 0 0 0 0
5 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0
7 1 1 0 0 0 0 0
CodePudding user response:
Here is a way. Get the unique elements in both cluster columns, coerce those columns to factor with the complete sequence as levels and tabulate.
To force a symmetric output, the function symMatrix
inspired in this comment(*) works if the entries to be changed are zeros.
(*) See also the benchmark, answers are more permanent than comments.
df1 <- 'Cluster.x Cluster.y n
1 4 8
1 7 1
2 1 2
2 4 4
2 7 1'
df1 <- read.table(textConnection(df1), header = TRUE)
cols <- grep("Cluster", names(df1))
u <- unique(unlist(df1[cols]))
df1[cols] <- lapply(df1[cols], \(x) factor(x, levels = Reduce(`:`, range(u))))
symMatrix <- function(x) x t(x)
xt <- xtabs(n ~ ., df1)
xt
#> Cluster.y
#> Cluster.x 1 2 3 4 5 6 7
#> 1 0 0 0 8 0 0 1
#> 2 2 0 0 4 0 0 1
#> 3 0 0 0 0 0 0 0
#> 4 0 0 0 0 0 0 0
#> 5 0 0 0 0 0 0 0
#> 6 0 0 0 0 0 0 0
#> 7 0 0 0 0 0 0 0
symMatrix(xt)
#> Cluster.y
#> Cluster.x 1 2 3 4 5 6 7
#> 1 0 2 0 8 0 0 1
#> 2 2 0 0 4 0 0 1
#> 3 0 0 0 0 0 0 0
#> 4 8 4 0 0 0 0 0
#> 5 0 0 0 0 0 0 0
#> 6 0 0 0 0 0 0 0
#> 7 1 1 0 0 0 0 0
Created on 2022-10-21 with reprex v2.0.2