Home > database >  Transform observation count into crosstab
Transform observation count into crosstab

Time:10-22

I have data that looks like this

  Cluster.x Cluster.y  n
         1         4   8
         1         7   1
         2         1   2
         2         4   4
         2         7   1

And i want to transform it to a crosstable that looks like this

        Cluster.y
Cluster.x  1  2  3  4  5  7
        1  0  2  0  8  0  1
        2  2  0  0  4  0  1
        3  0  0  0  0  0  0
        4  8  4  0  0  0  0
        5  0  0  0  0  0  0
        7  1  1  0  0  0  0

How can i do this transformation? It is important for me, that the table is symmetric and is readable as rows or columns with the same results.

CodePudding user response:

Firstly, create a matrix with zeros,

out <- matrix(0, 7,7)

Then assign your values into them by flipping columns,

out[as.matrix(df[,1:2])]  <- df[,3]
out[as.matrix(df[,2:1])]  <- df[,3]

gives,

#         [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#    [1,]    0    2    0    8    0    0    1
#    [2,]    2    0    0    4    0    0    1
#    [3,]    0    0    0    0    0    0    0
#    [4,]    8    4    0    0    0    0    0
#    [5,]    0    0    0    0    0    0    0
#    [6,]    0    0    0    0    0    0    0
#    [7,]    1    1    0    0    0    0    0

Data:

df <- read.table(text="
 Cluster.x Cluster.y  n
         1         4   8
         1         7   1
         2         1   2
         2         4   4
         2         7   1",header=T)

CodePudding user response:

library(Matrix)

dims <- max(dt$Cluster.x, dt$Cluster.y)

m <- sparseMatrix(i = dt$Cluster.x, j = dt$Cluster.y, x = dt$n, dims = c(dims, dims))

matrix(m, nrow = dims, ncol = dims, dimnames = list(1:dims, 1:dims))

#   1 2 3 4 5 6 7
# 1 0 0 0 8 0 0 1
# 2 2 0 0 4 0 0 1
# 3 0 0 0 0 0 0 0
# 4 0 0 0 0 0 0 0
# 5 0 0 0 0 0 0 0
# 6 0 0 0 0 0 0 0
# 7 0 0 0 0 0 0 0

Edit

You can get the symetric one rowbinding the reverse values and remove duplicate rows. I use data.table here.

dt <- unique(data.table::rbindlist(list(
  dt, dt[, .(Cluster.x = Cluster.y, Cluster.y = Cluster.x, n = n)]
)))

This will result in:

  1 2 3 4 5 6 7
1 0 2 0 8 0 0 1
2 2 0 0 4 0 0 1
3 0 0 0 0 0 0 0
4 8 4 0 0 0 0 0
5 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0
7 1 1 0 0 0 0 0

CodePudding user response:

Here is a way. Get the unique elements in both cluster columns, coerce those columns to factor with the complete sequence as levels and tabulate.

To force a symmetric output, the function symMatrix inspired in this comment(*) works if the entries to be changed are zeros.

(*) See also the benchmark, answers are more permanent than comments.

df1 <- 'Cluster.x Cluster.y  n
         1         4   8
         1         7   1
         2         1   2
         2         4   4
         2         7   1'
df1 <- read.table(textConnection(df1), header = TRUE)

cols <- grep("Cluster", names(df1))
u <- unique(unlist(df1[cols]))
df1[cols] <- lapply(df1[cols], \(x) factor(x, levels = Reduce(`:`, range(u))))

symMatrix <- function(x) x   t(x)

xt <- xtabs(n ~ ., df1)
xt
#>          Cluster.y
#> Cluster.x 1 2 3 4 5 6 7
#>         1 0 0 0 8 0 0 1
#>         2 2 0 0 4 0 0 1
#>         3 0 0 0 0 0 0 0
#>         4 0 0 0 0 0 0 0
#>         5 0 0 0 0 0 0 0
#>         6 0 0 0 0 0 0 0
#>         7 0 0 0 0 0 0 0

symMatrix(xt)
#>          Cluster.y
#> Cluster.x 1 2 3 4 5 6 7
#>         1 0 2 0 8 0 0 1
#>         2 2 0 0 4 0 0 1
#>         3 0 0 0 0 0 0 0
#>         4 8 4 0 0 0 0 0
#>         5 0 0 0 0 0 0 0
#>         6 0 0 0 0 0 0 0
#>         7 1 1 0 0 0 0 0

Created on 2022-10-21 with reprex v2.0.2

  • Related