I am trying to automate the process of calculating Jaccard's index of similarity for every possible pair of sites surveyed in a recent vegetation study.
Below is a dummy list in the format of my data, where x, y, and z are discrete survey sites, and function jaccard()
.
x <- c("sp1","sp2","sp3")
y <- c("sp2","sp3","sp4")
z <- c("sp3","sp4","sp5")
dummy_list <- list(x,y,z)
jaccard <- function(a, b) {
intersection = length(intersect(a, b))
union = length(a) length(b) - intersection
return (intersection/union) }
I want to pass each pairwise comparison (x-y, x-z, y-z) to jaccard()
and output a matrix of calculated Jaccard indicies. How can I achieve this?
CodePudding user response:
We could first Vectorize
your jaccard
function and then use outer
:
x <- c("sp1","sp2","sp3")
y <- c("sp2","sp3","sp4")
z <- c("sp3","sp4","sp5")
dummy_list <- setNames(list(x, y, z), c("x","y","z"))
jaccard <- function(a, b) {
intersection = length(intersect(a, b))
union = length(a) length(b) - intersection
return (intersection/union)
}
vjaccard <- Vectorize(jaccard)
outer(dummy_list, dummy_list, FUN = "vjaccard")
#> x y z
#> x 1.0 0.5 0.2
#> y 0.5 1.0 0.5
#> z 0.2 0.5 1.0
Created on 2022-03-02 by the reprex package (v2.0.1)
CodePudding user response:
jaccard <- function(List) {
ln <- combn(List, 2,function(x){
n <- length(intersect(x[[1]], x[[2]]))
m <- length(unlist(x))
n/(m-n)})
structure(ln, Size = length(ln), Diag = FALSE, class = 'dist')
}
jaccard(dummy_list)
1 2
2 0.5
3 0.2 0.5
CodePudding user response:
We can use the following base R approach (without using the jaccard
function but following the same definition)
> dummy_list <- list(x = x, y = y, z = z)
> 1 / (outer(lengths(dummy_list), lengths(dummy_list), ` `) / crossprod(table(stack(dummy_list))) - 1)
x y z
x 1.0 0.5 0.2
y 0.5 1.0 0.5
z 0.2 0.5 1.0