Home > database >  Calculate distance between multiple latitude and longitude points
Calculate distance between multiple latitude and longitude points

Time:03-30

I have a dataset that has latitude and longitude information for participants' home and work, and I'd like to create a new column in the dataset containing the euclidean distance between home and work for each participant. I think this should be relatively simple, but all the other Q&As I've seen seem to be dealing with slightly different issues.

To start, I tried running this code (using the geosphere package):

distm(c(homelong, homelat), c(worklong, worklat), fun=distHaversine)

But got an error saying "Error in .pointsToMatrix(x) : Wrong length for a vector, should be 2" because (if I understand correctly) I'm trying to calculate the distance between multiple sets of two points.

Can I adjust this code to get what I'm looking for, or is there something else I should be trying instead? Thanks!

CodePudding user response:

You can have the latitudes and longitudes in a dataframe and then do rowwise operations on the dataframe to get the distance corresponding to each row.

library(tidyverse)
library(geosphere)
    
locations <- tibble(
      homelong = c(0, 2),
      homelat = c(2, 5),
      worklong = c(70, 60),
      worklat = c(45, 60)
    )
    


locations %>%
     rowwise() %>% 
     mutate(d = as.numeric(distm(c(homelong, homelat), c(worklong, worklat), fun = distHaversine)))

results in

# A tibble: 2 x 5
# Rowwise: 
  homelong homelat worklong worklat        d
     <dbl>   <dbl>    <dbl>   <dbl>    <dbl>
1        0       2       70      45 8299015.
2        2       5       60      60 7809933.

CodePudding user response:

distm() returns a distance matrix, which is not what you want; you want the pairwise distances. So use the distance function (distHaversine(), distGeo(), or whatever) directly:

library(tidyverse)

locations <- tibble(
    homelong = c(0, 2),
    homelat = c(2, 5),
    worklong = c(70, 60),
    worklat = c(45, 60)
)

locations %>%
    mutate(
        dist = geosphere::distHaversine(cbind(homelong, homelat), cbind(worklong, worklat))
    )
#> # A tibble: 2 × 5
#>   homelong homelat worklong worklat     dist
#>      <dbl>   <dbl>    <dbl>   <dbl>    <dbl>
#> 1        0       2       70      45 8299015.
#> 2        2       5       60      60 7809933.

Note that geosphere functions want matrices as inputs, so you can cbind() your columns together. Don't c() them; that's creating a single shapeless vector and losing the differentiation between lon and lat. This is the cause of the error, I suspect; the vector only has one dimension, not two like a matrix.

  • Related