Create Dataframe of Number of Overlapping Elements Compared-CodePudding

I have a single list that looks like this

main.list <- c("dog", "cat", "bird", "snake")

I have a bunch of equal sized comparator elements that share some, but not all, the elements in main.list

comparator.list1 <- c("dog", "cat", "bird", "crescent")
comparator.list2 <- c("dog", "lizard", "cup", "plate")
comparator.list3 <- c("lizard", "bird", "squirrel", "snake")

I want to make a list that consists of the proportion overlapping elements between all the comparator lists, and the main list. So in this case:

List               Number.ofshared.elemts
comparator.list1   0.75
comparator.list2   0.25
comparator.list3   0.5

How can I do that?

CodePudding user response：

Get the 'comparator' objects in a list, use %in% to return a logical vector by comparing the elements with 'main.list', convert to proportion with mean, and stack the key/value pair to a data.frame with two columns

out <- stack(lapply(mget(ls(pattern = 'comparator')),
      function(x) mean(main.list %in% x)))[2:1]
names(out) <- c("List", "Number.of.shared.elements")

-output

> out
              List Number.of.shared.elements
1 comparator.list1                      0.75
2 comparator.list2                      0.25
3 comparator.list3                      0.50

We may also use intersect with length and divide by the length of the vector

out <- stack(lapply(mget(ls(pattern = 'comparator')),
    function(x) length(intersect(main.list, x))/length(x)))[2:1]
names(out) <- c("List", "Number.of.shared.elements")

Or using tidyverse

library(dplyr)
library(tibble)
library(tidyr)
mget(ls(pattern = 'comparator')) %>% 
  enframe(name = 'List') %>% 
  unnest(value) %>%
  group_by(List) %>%
  summarise(Number.of.shared.elements = length(intersect(value, 
        main.list))/n(), .groups = 'drop')
# A tibble: 3 × 2
  List             Number.of.shared.elements
  <chr>                                <dbl>
1 comparator.list1                      0.75
2 comparator.list2                      0.25
3 comparator.list3                      0.5

CodePudding user response：

A possible alternative:

library(tidyverse)
# Data --------------------------------------------------------------------
main.list <- c("dog", "cat", "bird", "snake")

comparator.list1 <- c("dog", "cat", "bird", "crescent")
comparator.list2 <- c("dog", "lizard", "cup", "plate")
comparator.list3 <- c("lizard", "bird", "squirrel", "snake")

# Code --------------------------------------------------------------------

nms <- str_subset(ls(), '^comparator\\.')

 nms %>%
  syms() %>% 
  map2_dfr(nms, ~ tibble(List = .y, Number.ofshared.elemts = mean(eval(.) %in% main.list)))
#> # A tibble: 3 × 2
#>   List             Number.ofshared.elemts
#>   <chr>                             <dbl>
#> 1 comparator.list1                   0.75
#> 2 comparator.list2                   0.25
#> 3 comparator.list3                   0.5

^{Created on 2021-11-21 by the reprex package (v2.0.1)}