I have an R function that calculates the Hamming distance of two vectors:
Hamming = function(x,y){
get_dist = sum(x != y, na.rm=TRUE)
return(get_dist)
}
that I would like to apply to every row of two matrices M1, M2 without using a for loop. What I currently have (where L is the number of rows in M1 and M2) is the very time-consuming loop:
xdiff = c()
for(i in 1:L){
xdiff = c(xdiff, Hamming(M1[i,],M2[i,]))
}
I thought that this could be done by executing
mapply(Hamming, t(M1), t(M2))
(with the transpose because mapply works across columns), but this doesn't generate a length L vector of Hamming distances for each row, so perhaps I'm misunderstanding what mapply is doing.
Is there a straightforward application of mapply or something else in the R apply family that would work?
CodePudding user response:
If dim(M1)
and dim(M2)
are identical, then you can simply do:
rowSums(M1 != M2, na.rm = TRUE)
Your attempt with mapply
didn't work because m
-by-n
matrices are stored as m*n
-length vectors, and mapply
handles them as such. To accomplish this with mapply
, you would need to split each matrix into a list of row vectors:
mapply(Hamming, asplit(M1, 1L), asplit(M2, 1L))
vapply
would be better, though:
vapply(seq_len(nrow(M1)), function(i) Hamming(M1[i, ], M2[i, ]), 0L)
In any case, just use rowSums
.