Home > Enterprise >  Remove common characters in lists of lists
Remove common characters in lists of lists

Time:11-24

I have a list of lists (containing characters), I would like to remove the common characters between these sublists.

For example

mylist = list(
list("tata","titi","toto","tete"),
list("fifi","fafa","toto","fefe"),
list("fifi","toto","rere","rara")
)

becomes

mylist = list(
list("tata","titi","tete"),
list("fafa","fefe"),
list("rere","rara")
)

I first created a list of the common elements and tried to substrate this list from the sublist but it does not work

common_elements = list(Reduce(intersect, mylist))
mylist = mylist[!(mylist %in% common_elements)]

Could you help me ? Thank you !

CodePudding user response:

We can use similar solution as in the previous post, by unnesting twice after enframe the nested list to a two column tibble. After grouping by 'value', filter the rows where the number of distinct elements in 'name' is 1, then split the list converted (as.list) 'value' column by the 'name'

library(dplyr)
library(tibble)
library(tidyr)
mylist2 <- enframe(mylist) %>%
    unnest(value)  %>%  
    unnest(value) %>%
    group_by(value) %>%
    filter(n_distinct(name) == 1) %>% 
    with(., split(as.list(value), name)) %>%
    unname

-output

> str(mylist2)
List of 3
 $ :List of 3
  ..$ : chr "tata"
  ..$ : chr "titi"
  ..$ : chr "tete"
 $ :List of 2
  ..$ : chr "fafa"
  ..$ : chr "fefe"
 $ :List of 2
  ..$ : chr "rere"
  ..$ : chr "rara"

CodePudding user response:

For this kind of situation, it is probably better for the elements of the list be atomic vectors, instead of lists themselves. For example like this:

mylist <- list(
    c("tata","titi","toto","tete"),
    c("fifi","fafa","toto","fefe"),
    c("fifi","toto","rere","rara")
)

You can convert your original format to this format by saying something like mylist_atomic <- lapply(mylist, unlist).

If I understand your question correctly, you want to filter each element of the list to just the strings that don't appear in >1 list element.

If you don't care about efficiency, here is one straightforward way to achieve this:

appears_in <- function(mylist, string){
    # check how many elements of `mylist` the arg `string` appears in 
    return(sum(sapply(mylist, function(v) string %in% v)))
}

filter_list <- function(mylist){
    result <- vector(mode='list', length=3)
    for (idx in seq_along(mylist)){
        elem <- mylist[[idx]]
        for (string in elem){
            if (appears_in(mylist, string) == 1){
                result[[idx]] <- c(result[[idx]], string)
            }
        }
    }
    return(result)
}

Then you can call filter_list() like this:

mylist_original <- list(
    c("tata","titi","toto","tete"),
    c("fifi","fafa","toto","fefe"),
    c("fifi","toto","rere","rara")
)

mylist_filtered <- filter_list(mylist_original)

print(mylist_filtered)
# [[1]]
# [1] "tata" "titi" "tete"
# 
# [[2]]
# [1] "fafa" "fefe"
# 
# [[3]]
# [1] "rere" "rara"

Many ways to skin a cat, and this is one of them.

CodePudding user response:

Just continue from where you are:

 mylist2 <- lapply(mylist,\(x)x[!x%in% unlist(common_elements)])

 dput(mylist2)

 list(
      list("tata", "titi", "tete"), 
      list("fifi", "fafa", "fefe"), 
      list("fifi", "rere", "rara")
 )

Which is exactly what you are looking for

  • Related