I would like to calculate the distance between two latitude and longitude locations in a dataframe.
df <- tibble("lat1"=c(0,1,2),"lon1"=c(0,1,2),"lat2"=c(90,91,92),"lon2"=c(90,91,92))
df %>%
dplyr::mutate(distance_trip=mapply(FUN = distHaversine,c(lat1, lon1),c(lat2, lon2)))
The error that I am getting is:
*Error: Problem with `mutate()` column `distance_trip`.
ℹ `distance_trip = mapply(FUN = distHaversine, c(lat1, lon1), c(lat2, lon2))`.
x Wrong length for a vector, should be 2
Run `rlang::last_error()` to see where the error occurred.*
Not sure why I am unable to apply the function to the data frame.
CodePudding user response:
A few things going on here:
geosphere::distHaversine
is vectorized, (matricized?), so that if we give itmatrix
input, we don't need to usemapply
to do it one-row-at-a-time. This will be much faster.c()
just sticks things together.c(lat1, lon1)
will be0, 1, 2, 0, 1, 2
, one after the other. We needcbind
to make a matrix. (I see what you were going for withmapply
, but the longc(lat1, lon1)
vector is what you're passing in tomapply
, instead you'd need to passlat1
andlon1
in separately andc()
the individual items inside theFUN
... but thematrix
approach will be better.)Despite the colloquial usage of "lat, lon", almost all functions expect "lon, lat", as longitude is more like "x" and latitude is more like "y" and the
(x, y)
paradigm holds.The maximum latitude is 90, but your
lat2
has values 91 and 92, which will cause an error if not addressed.
Fixing all of these, we can use:
df <- tibble(
"lat1"=c(0,1,2),
"lon1"=c(0,1,2),
"lat2"=4:6, ## valid latitudes
"lon2"=c(90,91,92)
)
library(dplyr)
library(geosphere)
df %>%
mutate(distance_trip = geosphere::distHaversine(
cbind(lon1, lat1), cbind(lon2, lat2)
))
# # A tibble: 3 × 5
# lat1 lon1 lat2 lon2 distance_trip
# <dbl> <dbl> <int> <dbl> <dbl>
# 1 0 0 4 90 10018754.
# 2 1 1 5 91 10009053.
# 3 2 2 6 92 9995487.
This is how to make it work with mapply
--it's a bit more awkward to write and will be slower too :(
df <- tibble(
"lat1"=c(0,1,2),
"lon1"=c(0,1,2),
"lat2"=4:6, ## valid latitudes
"lon2"=c(90,91,92)
)
df %>%
mutate(distance_trip = mapply(
FUN = function(x1, y1, x2, y2) {
distHaversine(c(x1, y1), c(x2, y2))
},
x1 = lon1, y1 = lat1, x2 = lon2, y2 = lat2
))