Home > OS >  How to summarize a list of combination in R
How to summarize a list of combination in R

Time:12-15

I have a list of 2 elements' combination like below.

cbnl <- list(
  c("A", "B"), c("B", "A"), c("C", "D"), c("E", "D"), c("F", "G"), c("H", "I"),
  c("J", "K"), c("I", "H"), c("K", "J"), c("G", "F"), c("D", "C"), c("E", "C"),
  c("D", "E"), c("C", "E")
)

I'd like to summarize above list. Expected result is like below list. Order of element in a vector doesn't matter here.

[[1]]
[1] "A" "B"

[[2]]
[1] "C" "D" "E"

[[3]]
[1] "F" "G"

[[4]]
[1] "H" "I"

[[5]]
[1] "J" "K"

(Rule 1) {A, B} is equivalent to {B, A}. To correspond this I think I can do this.

cbnl <- unique(lapply(cbnl, function(i) { sort(i) }))

(Rule 2) {A, B}, {B, C} (One of element is common) then take a union of two sets. It results {A, B, C}. I don't have clear nice idea to do this.

Any efficient way to do this?

Thank you very much in advance.

CodePudding user response:

I know this answer is more like a traditional programming rather than "R like" but it solves the issue.

cbnl <- unique(lapply(cbnl, sort))

i<-1
count <- 1
out <- list()

while(i <= length(cbnl) -1 ) {

    if(sum(cbnl[[i]] %in% cbnl[[i 1]])==0) {

        out[[count]] <-cbnl[[i]]
      
           


    }else{

        out[[count]] <- sort(unique(c(cbnl[[i]],cbnl[[i 1]])))
      
      i <- i   1        
       

    }
    
    count <- count  1   
    i <- i   1 


    }
out

gives,

[[1]]
[1] "A" "B"

[[2]]
[1] "C" "D" "E"

[[3]]
[1] "F" "G"

[[4]]
[1] "H" "I"

[[5]]
[1] "J" "K"

CodePudding user response:

I took a one line of code from @ThomasIsCoding and would like to show that we can achieve this using my package dedupewider.

library(dedupewider)
library(purrr)
library(magrittr)

cbnl <- list(
  c("A", "B"), c("B", "A"), c("C", "D"), c("E", "D"), c("F", "G"), c("H", "I"),
  c("J", "K"), c("I", "H"), c("K", "J"), c("G", "F"), c("D", "C"), c("E", "C"),
  c("D", "E"), c("C", "E")
)

cbnl_df <- data.frame(do.call(rbind, cbnl))

result <- dedupe_wide(cbnl_df, names(cbnl_df)) # it performs deduplication by connecting elements which are linked by transitive relation

result_list <- as.list(as.data.frame(t(result)))

result_list <- map(result_list, ~ .x[!is.na(.x)]) # remove NA
result_list
#> $V1
#> [1] "A" "B"
#> 
#> $V2
#> [1] "C" "E" "D"
#> 
#> $V3
#> [1] "F" "G"
#> 
#> $V4
#> [1] "H" "I"
#> 
#> $V5
#> [1] "J" "K"

A lot of steps are necessary, because list is an input and output, so with data.frame we would have less code than above.

CodePudding user response:

You can try the following igraph option

library(igraph)

graph_from_data_frame(do.call(rbind, cbnl)) %>%
  components() %>%
  membership() %>%
  stack() %>%
  with(., split(as.character(ind), values))

which gives

$`1`
[1] "A" "B"

$`2`
[1] "C" "E" "D"

$`3`
[1] "F" "G"

$`4`
[1] "H" "I"

$`5`
[1] "J" "K"

A shorter one

graph_from_data_frame(do.call(rbind, cbnl)) %>%
  decompose() %>%
  Map(function(x) names(V(x)), .)

which gives

[[1]]
[1] "A" "B"

[[2]]
[1] "C" "E" "D"

[[3]]
[1] "F" "G"

[[4]]
[1] "H" "I"

[[5]]
[1] "J" "K"

CodePudding user response:

sorting union as FUN= in combn.

combn(cbnl, 2, \(x) {
  if (!length(intersect(x[[1]], x[[2]])) == 0) {
    `length<-`(sort(union(x[[1]], x[[2]])), 3)
  } else {
    rep(NA, 3)
  }
}) |>
  (\(x) x[, !colSums(is.na(x)) == 3])() |>
  (\(x) as.list(as.data.frame(x[, !duplicated(x[1, ])])))() |>
  (\(x) lapply(x, \(x) x[!is.na(x)]))()
# $V1
# [1] "A" "B"
# 
# $V2
# [1] "C" "D" "E"
# 
# $V3
# [1] "D" "E"
# 
# $V4
# [1] "F" "G"
# 
# $V5
# [1] "H" "I"
# 
# $V6
# [1] "J" "K"

Note:

> R.version.string
[1] "R version 4.1.2 (2021-11-01)"
  • Related