I have a list of lists (containing characters), I would like to remove the common characters between these sublists.
For example
mylist = list(
list("tata","titi","toto","tete"),
list("fifi","fafa","toto","fefe"),
list("fifi","toto","rere","rara")
)
becomes
mylist = list(
list("tata","titi","tete"),
list("fafa","fefe"),
list("rere","rara")
)
I first created a list of the common elements and tried to substrate this list from the sublist but it does not work
common_elements = list(Reduce(intersect, mylist))
mylist = mylist[!(mylist %in% common_elements)]
Could you help me ? Thank you !
CodePudding user response:
We can use similar solution as in the previous post, by unnest
ing twice after enframe
the nested list to a two column tibble. After grouping by 'value', filter
the rows where the number of distinct elements in 'name' is 1, then split
the list
converted (as.list
) 'value' column by the 'name'
library(dplyr)
library(tibble)
library(tidyr)
mylist2 <- enframe(mylist) %>%
unnest(value) %>%
unnest(value) %>%
group_by(value) %>%
filter(n_distinct(name) == 1) %>%
with(., split(as.list(value), name)) %>%
unname
-output
> str(mylist2)
List of 3
$ :List of 3
..$ : chr "tata"
..$ : chr "titi"
..$ : chr "tete"
$ :List of 2
..$ : chr "fafa"
..$ : chr "fefe"
$ :List of 2
..$ : chr "rere"
..$ : chr "rara"
CodePudding user response:
For this kind of situation, it is probably better for the elements of the list be atomic vectors, instead of lists themselves. For example like this:
mylist <- list(
c("tata","titi","toto","tete"),
c("fifi","fafa","toto","fefe"),
c("fifi","toto","rere","rara")
)
You can convert your original format to this format by saying something like mylist_atomic <- lapply(mylist, unlist)
.
If I understand your question correctly, you want to filter each element of the list to just the strings that don't appear in >1 list element.
If you don't care about efficiency, here is one straightforward way to achieve this:
appears_in <- function(mylist, string){
# check how many elements of `mylist` the arg `string` appears in
return(sum(sapply(mylist, function(v) string %in% v)))
}
filter_list <- function(mylist){
result <- vector(mode='list', length=3)
for (idx in seq_along(mylist)){
elem <- mylist[[idx]]
for (string in elem){
if (appears_in(mylist, string) == 1){
result[[idx]] <- c(result[[idx]], string)
}
}
}
return(result)
}
Then you can call filter_list()
like this:
mylist_original <- list(
c("tata","titi","toto","tete"),
c("fifi","fafa","toto","fefe"),
c("fifi","toto","rere","rara")
)
mylist_filtered <- filter_list(mylist_original)
print(mylist_filtered)
# [[1]]
# [1] "tata" "titi" "tete"
#
# [[2]]
# [1] "fafa" "fefe"
#
# [[3]]
# [1] "rere" "rara"
Many ways to skin a cat, and this is one of them.
CodePudding user response:
Just continue from where you are:
mylist2 <- lapply(mylist,\(x)x[!x%in% unlist(common_elements)])
dput(mylist2)
list(
list("tata", "titi", "tete"),
list("fifi", "fafa", "fefe"),
list("fifi", "rere", "rara")
)
Which is exactly what you are looking for