Matrix that counts how many times the combination of a row and column of that matrix are present in-CodePudding

I am still quite new with R so please bare with me :)

I need to create a matrix that counts how many times the combination of a row and column of that matrix are present in a dataframe.

As my description is probably quite vague, I have given an example set below. In reality, my dataset will contain many more fruits in the matrix and many more juices in the dataframe, so I'm looking for an efficient way to tackle this problem.

#Stackoverflow example
#Create empty matrix ----
newMatrix <- matrix(0, nrow = 5, ncol = 5)
colnames(newMatrix) <- c("Apple", "Pear", "Orange", "Mango", "Banana")
rownames(newMatrix) <- c("Apple", "Pear", "Orange", "Mango", "Banana")

#Create dataframe ----
newDf <- data.frame(c("Juice 1", "Juice 2", "Juice 3", "Juice 4","Juice 5"),
                    c("Banana", "Banana", "Orange", "Pear", "Apple"),
                    c("Pear", "Orange", "Pear", "Apple", "Pear"),
                    c("Orange", "Mango", NA, NA, NA))
colnames(newDf) <- c("Juice", "Fruit 1", "Fruit 2", "Fruit 3")

I want to create a for loop that goes over every element in my newMatrix and adds 1 if the combination of the column and row are present in a row of newDf.
So in essence, how many juices have a combination of for example Apple and Pear, how many juices have a combination of Apple and Mango, and so forth.

The output should look like this:

       Apple Pear Orange Mango Banana
Apple      0    2      0     0      0
Pear       2    0      2     0      1
Orange     0    2      0     1      2
Mango      0    0      1     0      1
Banana     0    1      2     1      0

I started by trying to create a for loop but I got stuck at the if part:

for (i in 1:nrow(adj_matrix)){
  for (j in 1:ncol(adj_matrix)) {
    if (???)
      adj_matrix[i,j] <- adj_matrix[i,j]   1
  }
}

Can somebody help me with this? Would be highly appreciated!

CodePudding user response：

With base R, you can take the combinations of your values, and then use igraph to get the adjacency matrix:

library(igraph)

m <- do.call(cbind, apply(newDf[-1], 1, \(x) if(sum(complete.cases(x)) >= 2) combn(x, m = 2) else x, simplify = F))
g <- graph_from_data_frame(na.omit(t(m)), directed = F)
get.adjacency(g, sparse = F)

       Banana Pear Orange Apple Mango
Banana      0    1      2     0     1
Pear        1    0      2     2     0
Orange      2    2      0     0     1
Apple       0    2      0     0     0
Mango       1    0      1     0     0

It might a bit convoluted, but you can also use igraph with tidyverse packages:

library(igraph)
library(tidyverse)

newDf %>% 
  pivot_longer(-Juice) %>% 
  group_by(Juice) %>% 
  summarise(new = ifelse(n() > 1, paste(combn(na.omit(value), 2), collapse = "-"), value)) %>% 
  separate_rows(new, sep = "(?:[^-]*(?:-[^-]*){1})\\K-") %>% 
  separate(new, into = c("X1", "X2")) %>% 
  select(-Juice) %>% 
  graph_from_data_frame(directed = FALSE) %>% 
  get.adjacency(sparse = FALSE)

       Banana Pear Orange Apple Mango
Banana      0    1      2     0     1
Pear        1    0      2     2     0
Orange      2    2      0     0     1
Apple       0    2      0     0     0
Mango       1    0      1     0     0

CodePudding user response：

The for loop can be written like this.

cb <- combn(2:4, 2)  ## cols combinations newDf 

## initialize adj_matrix
v <- c("Apple", "Pear", "Orange", "Mango", "Banana")
adj_matrix <- matrix(0, length(v), length(v), dimnames=list(v, v))

for (k in seq_len(nrow(newDf))) {
    for (l in seq_len(ncol(cb))) {
        x <- unlist(newDf[k, cb[, l]])
        if (length(x[!is.na(x)]) == 2) {
        adj_matrix[x[1], x[2]] <- adj_matrix[x[1], x[2]]   1
        adj_matrix[x[2], x[1]] <- adj_matrix[x[2], x[1]]   1
        }
    }
}

adj_matrix
#        Apple Pear Orange Mango Banana
# Apple      0    2      0     0      0
# Pear       2    0      2     0      1
# Orange     0    2      0     1      2
# Mango      0    0      1     0      1
# Banana     0    1      2     1      0

Data:

newDf <- structure(list(Juice = c("Juice 1", "Juice 2", "Juice 3", "Juice 4", 
"Juice 5"), `Fruit 1` = c("Banana", "Banana", "Orange", "Pear", 
"Apple"), `Fruit 2` = c("Pear", "Orange", "Pear", "Apple", "Pear"
), `Fruit 3` = c("Orange", "Mango", NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-5L))