I have a list of 2 elements' combination like below.
cbnl <- list(
c("A", "B"), c("B", "A"), c("C", "D"), c("E", "D"), c("F", "G"), c("H", "I"),
c("J", "K"), c("I", "H"), c("K", "J"), c("G", "F"), c("D", "C"), c("E", "C"),
c("D", "E"), c("C", "E")
)
I'd like to summarize above list. Expected result is like below list. Order of element in a vector doesn't matter here.
[[1]]
[1] "A" "B"
[[2]]
[1] "C" "D" "E"
[[3]]
[1] "F" "G"
[[4]]
[1] "H" "I"
[[5]]
[1] "J" "K"
(Rule 1) {A, B} is equivalent to {B, A}. To correspond this I think I can do this.
cbnl <- unique(lapply(cbnl, function(i) { sort(i) }))
(Rule 2) {A, B}, {B, C} (One of element is common) then take a union of two sets. It results {A, B, C}. I don't have clear nice idea to do this.
Any efficient way to do this?
Thank you very much in advance.
CodePudding user response:
I know this answer is more like a traditional programming rather than "R like" but it solves the issue.
cbnl <- unique(lapply(cbnl, sort))
i<-1
count <- 1
out <- list()
while(i <= length(cbnl) -1 ) {
if(sum(cbnl[[i]] %in% cbnl[[i 1]])==0) {
out[[count]] <-cbnl[[i]]
}else{
out[[count]] <- sort(unique(c(cbnl[[i]],cbnl[[i 1]])))
i <- i 1
}
count <- count 1
i <- i 1
}
out
gives,
[[1]]
[1] "A" "B"
[[2]]
[1] "C" "D" "E"
[[3]]
[1] "F" "G"
[[4]]
[1] "H" "I"
[[5]]
[1] "J" "K"
CodePudding user response:
I took a one line of code from @ThomasIsCoding and would like to show that we can achieve this using my package dedupewider
.
library(dedupewider)
library(purrr)
library(magrittr)
cbnl <- list(
c("A", "B"), c("B", "A"), c("C", "D"), c("E", "D"), c("F", "G"), c("H", "I"),
c("J", "K"), c("I", "H"), c("K", "J"), c("G", "F"), c("D", "C"), c("E", "C"),
c("D", "E"), c("C", "E")
)
cbnl_df <- data.frame(do.call(rbind, cbnl))
result <- dedupe_wide(cbnl_df, names(cbnl_df)) # it performs deduplication by connecting elements which are linked by transitive relation
result_list <- as.list(as.data.frame(t(result)))
result_list <- map(result_list, ~ .x[!is.na(.x)]) # remove NA
result_list
#> $V1
#> [1] "A" "B"
#>
#> $V2
#> [1] "C" "E" "D"
#>
#> $V3
#> [1] "F" "G"
#>
#> $V4
#> [1] "H" "I"
#>
#> $V5
#> [1] "J" "K"
A lot of steps are necessary, because list is an input and output, so with data.frame we would have less code than above.
CodePudding user response:
You can try the following igraph
option
library(igraph)
graph_from_data_frame(do.call(rbind, cbnl)) %>%
components() %>%
membership() %>%
stack() %>%
with(., split(as.character(ind), values))
which gives
$`1`
[1] "A" "B"
$`2`
[1] "C" "E" "D"
$`3`
[1] "F" "G"
$`4`
[1] "H" "I"
$`5`
[1] "J" "K"
A shorter one
graph_from_data_frame(do.call(rbind, cbnl)) %>%
decompose() %>%
Map(function(x) names(V(x)), .)
which gives
[[1]]
[1] "A" "B"
[[2]]
[1] "C" "E" "D"
[[3]]
[1] "F" "G"
[[4]]
[1] "H" "I"
[[5]]
[1] "J" "K"
CodePudding user response:
sort
ing union
as FUN=
in combn
.
combn(cbnl, 2, \(x) {
if (!length(intersect(x[[1]], x[[2]])) == 0) {
`length<-`(sort(union(x[[1]], x[[2]])), 3)
} else {
rep(NA, 3)
}
}) |>
(\(x) x[, !colSums(is.na(x)) == 3])() |>
(\(x) as.list(as.data.frame(x[, !duplicated(x[1, ])])))() |>
(\(x) lapply(x, \(x) x[!is.na(x)]))()
# $V1
# [1] "A" "B"
#
# $V2
# [1] "C" "D" "E"
#
# $V3
# [1] "D" "E"
#
# $V4
# [1] "F" "G"
#
# $V5
# [1] "H" "I"
#
# $V6
# [1] "J" "K"
Note:
> R.version.string
[1] "R version 4.1.2 (2021-11-01)"