I have a list of character vectors like this:
my_list <- list(c('a','b','c','d','e'),c('e','f','g'),c('h','i','j'))
names(my_list) <- c("group1","group2","group3")
And I want to have a simple way to test my_list
for duplicates in the letters across any of the 3 groups/vectors in my list. So for instance, "e" appears in both group 1 and group 2 so that would be a duplicate. Anything simple that just returns a logical if there is at least one or more duplicates across 2 or more groups would be ideal. So a FALSE return would mean that the letters in each group are unique to that group only (this isn't the case in my example here obviously).
Thanks so much!
CodePudding user response:
A binary output can be generated with
any(duplicated(unlist(my_list)))
[1] TRUE
As pointed out correctly in comments by @sindri_baldur, if duplicates appear in groups they should be handled with unique
, if desired:
any(duplicated(unlist(lapply(my_list, unique))))
[1] TRUE
or another base R alternative
anyDuplicated(unlist(lapply(my_list, unique))) > 1
[1] TRUE
CodePudding user response:
You could do:
subset(stack(my_list), duplicated(values))$values
[1] "e"
If you need to tell whether all the values in a group are unique to that group, you could do:
result <- setNames(logical(length(my_list)), names(my_list))
result[unique(unlist(Filter(\(x)length(x)>1,
unstack(rev(stack(my_list))))))] <- TRUE
result
group1 group2 group3
TRUE TRUE FALSE
or even:
stack(my_list) %>%
mutate(dups = duplicated(values) | duplicated(values, f = T)) %>%
group_by(ind) %>%
summarise(logic = any(dups))
# A tibble: 3 x 2
ind logic
<fct> <lgl>
1 group1 TRUE
2 group2 TRUE
3 group3 FALSE
CodePudding user response:
We can stack
the named list
to a two column data.frame, get the frequency count with table
, check for duplicates by column with colSums
on a logical vector and return with the names
that are occuring more than 1
names(which(colSums(table(stack(my_list)[2:1])> 0) > 1))
[1] "e"
Or slighly more compact
names(which(table(unlist(my_list)) > 1))
[1] "e"
If we want a logical column
library(dplyr)
library(tidyr)
library(tibble)
enframe(my_list) %>%
unnest(value) %>%
group_by(value) %>%
mutate(flag = any(n_distinct(name) > 1)) %>%
group_by(name) %>%
summarise(flag = any(flag))
-output
# A tibble: 3 × 2
name flag
<chr> <lgl>
1 group1 TRUE
2 group2 TRUE
3 group3 FALSE
CodePudding user response:
Another possible solution, based on tidyr::expand_grid
and purrr::pmap_lgl
:
library(tidyverse)
my_list <- list(c('a','b','c','d','e'),c('e','f','g'),c('h','i','j'))
names(my_list) <- c("group1","group2","group3")
expandg <- expand_grid(names(my_list), names(my_list))
pmap_lgl(expandg, ~ any(my_list[[.x]] %in% my_list[[.y]])) %>%
bind_cols(id1 = expandg[[1]], id2 = expandg[[2]], value = .) %>%
group_by(Group = id1) %>% summarise(value = any(value[id1 != id2]))
#> # A tibble: 3 × 2
#> Group value
#> <chr> <lgl>
#> 1 group1 TRUE
#> 2 group2 TRUE
#> 3 group3 FALSE