Home > other >  Calculating the distance between two latitude and longitude Geocordinate in a R dataframe
Calculating the distance between two latitude and longitude Geocordinate in a R dataframe

Time:08-30

I would like to calculate the distance between two latitude and longitude locations in a dataframe.

df <- tibble("lat1"=c(0,1,2),"lon1"=c(0,1,2),"lat2"=c(90,91,92),"lon2"=c(90,91,92))

df %>% 
dplyr::mutate(distance_trip=mapply(FUN = distHaversine,c(lat1, lon1),c(lat2, lon2)))

The error that I am getting is:

*Error: Problem with `mutate()` column `distance_trip`.
ℹ `distance_trip = mapply(FUN = distHaversine, c(lat1, lon1), c(lat2, lon2))`.
x Wrong length for a vector, should be 2
Run `rlang::last_error()` to see where the error occurred.*

Not sure why I am unable to apply the function to the data frame.

CodePudding user response:

A few things going on here:

  1. geosphere::distHaversine is vectorized, (matricized?), so that if we give it matrix input, we don't need to use mapply to do it one-row-at-a-time. This will be much faster.

  2. c() just sticks things together. c(lat1, lon1) will be 0, 1, 2, 0, 1, 2, one after the other. We need cbind to make a matrix. (I see what you were going for with mapply, but the long c(lat1, lon1) vector is what you're passing in to mapply, instead you'd need to pass lat1 and lon1 in separately and c() the individual items inside the FUN... but the matrix approach will be better.)

  3. Despite the colloquial usage of "lat, lon", almost all functions expect "lon, lat", as longitude is more like "x" and latitude is more like "y" and the (x, y) paradigm holds.

  4. The maximum latitude is 90, but your lat2 has values 91 and 92, which will cause an error if not addressed.

Fixing all of these, we can use:

df <- tibble(
  "lat1"=c(0,1,2),
  "lon1"=c(0,1,2),
  "lat2"=4:6,  ## valid latitudes
  "lon2"=c(90,91,92)
)

library(dplyr)
library(geosphere)

df %>% 
  mutate(distance_trip = geosphere::distHaversine(
    cbind(lon1, lat1), cbind(lon2, lat2)
  ))
# # A tibble: 3 × 5
#    lat1  lon1  lat2  lon2 distance_trip
#   <dbl> <dbl> <int> <dbl>         <dbl>
# 1     0     0     4    90     10018754.
# 2     1     1     5    91     10009053.
# 3     2     2     6    92      9995487.

This is how to make it work with mapply--it's a bit more awkward to write and will be slower too :(

df <- tibble(
  "lat1"=c(0,1,2),
  "lon1"=c(0,1,2),
  "lat2"=4:6,  ## valid latitudes
  "lon2"=c(90,91,92)
)
df %>% 
  mutate(distance_trip = mapply(
    FUN = function(x1, y1, x2, y2) {
      distHaversine(c(x1, y1), c(x2, y2))
    },
    x1 = lon1, y1 = lat1, x2 = lon2, y2 = lat2
  ))
  •  Tags:  
  • r
  • Related