Home > Mobile >  Compute Pairwise Intersections Between Multiple Vectors
Compute Pairwise Intersections Between Multiple Vectors

Time:12-01

I have multiple dataframes that look like this:

>df1

NAME    
Josh
Sarah
Sammy
Jake

>df2

NAME    
Josh
Sarah
Sammy
Mark

>df3

NAME    
Josh
Michael
Mike
Adam 

>df4
NAME
Josh
Michael
Mike
Adam

I want to create a new dataframe that contains the number of intersections these dfs have, so like this

>df.final
    df1 df2 df3 df4
df1   4   3   1   4
df2   3   4   1   1
df3   1   1   4   4
df4   1   1   4   4

How can I achieve this? Essentially I'm looking to automate the intersect() and length() functions without manually typing them out.

#create the data
df1 <- data.frame(NAME=c("Josh", "Sarah", "Sammy", "Jake"))
df2 <- data.frame(NAME=c("Josh", "Sarah", "Sammy", "Mark"))
df3 <- data.frame(NAME=c("Josh", "Michael", "Mike", "Adam"))
df4 <- data.frame(NAME=c("Josh", "Michael", "Mike", "Adam"))

CodePudding user response:

#create the data
df1 <- data.frame(NAME=c("Josh", "Sarah", "Sammy", "Jake"))
df2 <- data.frame(NAME=c("Josh", "Sarah", "Sammy", "Mark"))
df3 <- data.frame(NAME=c("Josh", "Michael", "Mike", "Adam"))
df4 <- data.frame(NAME=c("Josh", "Michael", "Mike", "Adam"))

l <- c("df1","df2","df3","df4")
names(l) <- l
result <- outer(mget(l),mget(l), function(x,y) 
  mapply(function(x,y) length(intersect(x$NAME , y$NAME)),x,y ) )

result
#>     df1 df2 df3 df4
#> df1   4   3   1   1
#> df2   3   4   1   1
#> df3   1   1   4   4
#> df4   1   1   4   4

EDIT

Vectorize also works:

result <- outer(mget(l),mget(l), Vectorize(
  function(x,y) length(intersect(x$NAME , y$NAME))))
  • Related