I'm trying to calculate the Euclidean distance between pairs of points in a dataframe in R, and there's an ID for each pair:
ID <- sample(1:10, 10, replace=FALSE)
P <- runif(10, min=1, max=3)
S <- runif(10, min=1, max=3)
testdf <- data.frame(ID, P, S)
I found several ways to calculate the Euclidean distance in R, but I'm either getting an error, returning only 1 value (so it's computing the distance between the entire vector), or I end up with a matrix when all I need is a 4th column with the distance between each pair (columns 'P' and 'S.') I'm a bit confused by matrices so I'm not sure how to work with that result.
Tried making a function and applying it to the 2 columns but I get an error:
testdf$V <- apply(testdf[ , c('P', 'S')], 1, function(P, S) sqrt(sum((P^2, S^2)))
# Error in FUN(newX[, i], ...) : argument "S" is missing, with no default
Then tried using the dist() function in the stats package but it only returns 1 value: (Same problem if I follow the method here: https://www.statology.org/euclidean-distance-in-r/)
P <- testdf$P
S <- testdf$S
testProbMatrix <- rbind(P, S)
stats::dist(testProbMatrix, method = "euclidean")
# returns only 1 distance
Returns a matrix (Here's a nice explanation why: Calculate the distances between pairs of points in r)
stats::dist(cbind(P, S), method = "euclidean")
But I'm confused how to pull the distances out of the matrix and attach them to the correct ID for each pair of points. I don't understand why I have to make a matrix instead of just applying the function to the dataframe - matrices have always confused me. I think this is the same question as here (Finding euclidean distance between all pair of points) but for R instead of Python
Thanks for the help!
CodePudding user response:
Try this out if you would just like to add another column to your dataframe
testdf$distance <- sqrt((P^2 S^2))