Home > Back-end >  R: using mapply for a function of two vectors
R: using mapply for a function of two vectors

Time:02-18

I have an R function that calculates the Hamming distance of two vectors:

Hamming = function(x,y){
get_dist = sum(x != y, na.rm=TRUE)
return(get_dist)
}

that I would like to apply to every row of two matrices M1, M2 without using a for loop. What I currently have (where L is the number of rows in M1 and M2) is the very time-consuming loop:

xdiff = c()
for(i in 1:L){
    xdiff = c(xdiff, Hamming(M1[i,],M2[i,]))
}

I thought that this could be done by executing

mapply(Hamming, t(M1), t(M2))

(with the transpose because mapply works across columns), but this doesn't generate a length L vector of Hamming distances for each row, so perhaps I'm misunderstanding what mapply is doing.

Is there a straightforward application of mapply or something else in the R apply family that would work?

CodePudding user response:

If dim(M1) and dim(M2) are identical, then you can simply do:

rowSums(M1 != M2, na.rm = TRUE)

Your attempt with mapply didn't work because m-by-n matrices are stored as m*n-length vectors, and mapply handles them as such. To accomplish this with mapply, you would need to split each matrix into a list of row vectors:

mapply(Hamming, asplit(M1, 1L), asplit(M2, 1L))

vapply would be better, though:

vapply(seq_len(nrow(M1)), function(i) Hamming(M1[i, ], M2[i, ]), 0L)

In any case, just use rowSums.

  • Related