I have a list of 4 vectors with terms (characters). I'm looking to obtain a table with the pairwise comparison of the terms. How many are equal in each pairwise comparison?
Here is an example:
set.seed(20190708)
genes <- paste("gene",1:1000,sep="")
x <- list(
A = sample(genes,300),
B = sample(genes,525),
C = sample(genes,440),
D = sample(genes,350)
)
And here is what I'm looking for:
Those are the number of terms present in both groups.
CodePudding user response:
We may use outer
if we want a symmetric matrix as output, and as.dist
to present the result as just the lower triangle.
out <- outer(x, x, FUN = Vectorize(function(u, v) length(intersect(u, v))))
as.dist(out)
#> A B C
#> B 151
#> C 128 228
#> D 133 187 150
Or if it is just pairwise comparison without the mirror duplicates
out <- combn(x, 2, FUN = function(x) length(intersect(x[[1]], x[[2]])))
names(out) <- combn(names(x), 2, FUN = paste, collapse = "_")
stack(out)[2:1]
ind values
1 A_B 151
2 A_C 128
3 A_D 133
4 B_C 228
5 B_D 187
6 C_D 150
CodePudding user response:
Here is another base R option
> crossprod(table(stack(x)))
ind
ind A B C D
A 300 151 128 133
B 151 525 228 187
C 128 228 440 150
D 133 187 150 350
or
> as.dist(crossprod(table(stack(x))))
A B C
B 151
C 128 228
D 133 187 150