I have a list of character vectors like this:

my_list <- list(c('a','b','c','d','e'),c('e','f','g'),c('h','i','j'))
names(my_list) <- c("group1","group2","group3")

And I want to have a simple way to test my_list for duplicates in the letters across any of the 3 groups/vectors in my list. So for instance, "e" appears in both group 1 and group 2 so that would be a duplicate. Anything simple that just returns a logical if there is at least one or more duplicates across 2 or more groups would be ideal. So a FALSE return would mean that the letters in each group are unique to that group only (this isn't the case in my example here obviously).

CodePudding user response:

A binary output can be generated with

[1] TRUE

As pointed out correctly in comments by @sindri_baldur, if duplicates appear in groups they should be handled with unique, if desired:

any(duplicated(unlist(lapply(my_list, unique))))
[1] TRUE

or another base R alternative

anyDuplicated(unlist(lapply(my_list, unique))) > 1
[1] TRUE

CodePudding user response:

You could do:

subset(stack(my_list), duplicated(values))$values
[1] "e"

If you need to tell whether all the values in a group are unique to that group, you could do:

result <- setNames(logical(length(my_list)), names(my_list))

                            unstack(rev(stack(my_list))))))] <- TRUE
group1 group2 group3 

or even:

stack(my_list) %>%
  mutate(dups = duplicated(values) | duplicated(values, f = T)) %>%
  group_by(ind) %>%
  summarise(logic = any(dups))

# A tibble: 3 x 2
  ind    logic
  <fct>  <lgl>
1 group1 TRUE 
2 group2 TRUE 
3 group3 FALSE

CodePudding user response:

We can stack the named list to a two column data.frame, get the frequency count with table, check for duplicates by column with colSums on a logical vector and return with the names that are occuring more than 1

names(which(colSums(table(stack(my_list)[2:1])> 0) > 1))
[1] "e"

Or slighly more compact

 names(which(table(unlist(my_list)) > 1))
[1] "e"

If we want a logical column

enframe(my_list) %>%
   unnest(value) %>% 
  group_by(value) %>%
   mutate(flag = any(n_distinct(name) > 1)) %>% 
 group_by(name) %>% 
  summarise(flag = any(flag))


# A tibble: 3 × 2
  name   flag 
  <chr>  <lgl>
1 group1 TRUE 
2 group2 TRUE 
3 group3 FALSE

CodePudding user response:

Another possible solution, based on tidyr::expand_grid and purrr::pmap_lgl:


my_list <- list(c('a','b','c','d','e'),c('e','f','g'),c('h','i','j'))
names(my_list) <- c("group1","group2","group3")

expandg <- expand_grid(names(my_list), names(my_list))

pmap_lgl(expandg, ~ any(my_list[[.x]] %in% my_list[[.y]])) %>% 
  bind_cols(id1 = expandg[[1]], id2 = expandg[[2]], value = .) %>% 
  group_by(Group = id1) %>% summarise(value = any(value[id1 != id2]))

#> # A tibble: 3 × 2
#>   Group  value
#>   <chr>  <lgl>
#> 1 group1 TRUE 
#> 2 group2 TRUE 
#> 3 group3 FALSE
