How to summarize a list of combination in R-CodePudding

I have a list of 2 elements' combination like below.

cbnl <- list(
  c("A", "B"), c("B", "A"), c("C", "D"), c("E", "D"), c("F", "G"), c("H", "I"),
  c("J", "K"), c("I", "H"), c("K", "J"), c("G", "F"), c("D", "C"), c("E", "C"),
  c("D", "E"), c("C", "E")
)

I'd like to summarize above list. Expected result is like below list. Order of element in a vector doesn't matter here.

[[1]]
[1] "A" "B"

[[2]]
[1] "C" "D" "E"

[[3]]
[1] "F" "G"

[[4]]
[1] "H" "I"

[[5]]
[1] "J" "K"

(Rule 1) {A, B} is equivalent to {B, A}. To correspond this I think I can do this.

cbnl <- unique(lapply(cbnl, function(i) { sort(i) }))

(Rule 2) {A, B}, {B, C} (One of element is common) then take a union of two sets. It results {A, B, C}. I don't have clear nice idea to do this.

Any efficient way to do this?

Thank you very much in advance.

CodePudding user response：

I know this answer is more like a traditional programming rather than "R like" but it solves the issue.

cbnl <- unique(lapply(cbnl, sort))

i<-1
count <- 1
out <- list()

while(i <= length(cbnl) -1 ) {

    if(sum(cbnl[[i]] %in% cbnl[[i 1]])==0) {

        out[[count]] <-cbnl[[i]]
      
           


    }else{

        out[[count]] <- sort(unique(c(cbnl[[i]],cbnl[[i 1]])))
      
      i <- i   1        
       

    }
    
    count <- count  1   
    i <- i   1 


    }
out

gives,

[[1]]
[1] "A" "B"

[[2]]
[1] "C" "D" "E"

[[3]]
[1] "F" "G"

[[4]]
[1] "H" "I"

[[5]]
[1] "J" "K"

CodePudding user response：

I took a one line of code from @ThomasIsCoding and would like to show that we can achieve this using my package dedupewider.

library(dedupewider)
library(purrr)
library(magrittr)

cbnl <- list(
  c("A", "B"), c("B", "A"), c("C", "D"), c("E", "D"), c("F", "G"), c("H", "I"),
  c("J", "K"), c("I", "H"), c("K", "J"), c("G", "F"), c("D", "C"), c("E", "C"),
  c("D", "E"), c("C", "E")
)

cbnl_df <- data.frame(do.call(rbind, cbnl))

result <- dedupe_wide(cbnl_df, names(cbnl_df)) # it performs deduplication by connecting elements which are linked by transitive relation

result_list <- as.list(as.data.frame(t(result)))

result_list <- map(result_list, ~ .x[!is.na(.x)]) # remove NA
result_list
#> $V1
#> [1] "A" "B"
#> 
#> $V2
#> [1] "C" "E" "D"
#> 
#> $V3
#> [1] "F" "G"
#> 
#> $V4
#> [1] "H" "I"
#> 
#> $V5
#> [1] "J" "K"

A lot of steps are necessary, because list is an input and output, so with data.frame we would have less code than above.

CodePudding user response：

You can try the following igraph option

library(igraph)

graph_from_data_frame(do.call(rbind, cbnl)) %>%
  components() %>%
  membership() %>%
  stack() %>%
  with(., split(as.character(ind), values))

which gives

$`1`
[1] "A" "B"

$`2`
[1] "C" "E" "D"

$`3`
[1] "F" "G"

$`4`
[1] "H" "I"

$`5`
[1] "J" "K"

A shorter one

graph_from_data_frame(do.call(rbind, cbnl)) %>%
  decompose() %>%
  Map(function(x) names(V(x)), .)

which gives

[[1]]
[1] "A" "B"

[[2]]
[1] "C" "E" "D"

[[3]]
[1] "F" "G"

[[4]]
[1] "H" "I"

[[5]]
[1] "J" "K"

CodePudding user response：

sorting union as FUN= in combn.

combn(cbnl, 2, \(x) {
  if (!length(intersect(x[[1]], x[[2]])) == 0) {
    `length<-`(sort(union(x[[1]], x[[2]])), 3)
  } else {
    rep(NA, 3)
  }
}) |>
  (\(x) x[, !colSums(is.na(x)) == 3])() |>
  (\(x) as.list(as.data.frame(x[, !duplicated(x[1, ])])))() |>
  (\(x) lapply(x, \(x) x[!is.na(x)]))()
# $V1
# [1] "A" "B"
# 
# $V2
# [1] "C" "D" "E"
# 
# $V3
# [1] "D" "E"
# 
# $V4
# [1] "F" "G"
# 
# $V5
# [1] "H" "I"
# 
# $V6
# [1] "J" "K"

Note:

> R.version.string
[1] "R version 4.1.2 (2021-11-01)"