I have a dataset that has latitude and longitude information for participants' home and work, and I'd like to create a new column in the dataset containing the euclidean distance between home and work for each participant. I think this should be relatively simple, but all the other Q&As I've seen seem to be dealing with slightly different issues.
To start, I tried running this code (using the geosphere package):
distm(c(homelong, homelat), c(worklong, worklat), fun=distHaversine)
But got an error saying "Error in .pointsToMatrix(x) : Wrong length for a vector, should be 2" because (if I understand correctly) I'm trying to calculate the distance between multiple sets of two points.
Can I adjust this code to get what I'm looking for, or is there something else I should be trying instead? Thanks!
CodePudding user response:
You can have the latitudes and longitudes in a dataframe and then do rowwise operations on the dataframe to get the distance corresponding to each row.
library(tidyverse)
library(geosphere)
locations <- tibble(
homelong = c(0, 2),
homelat = c(2, 5),
worklong = c(70, 60),
worklat = c(45, 60)
)
locations %>%
rowwise() %>%
mutate(d = as.numeric(distm(c(homelong, homelat), c(worklong, worklat), fun = distHaversine)))
results in
# A tibble: 2 x 5
# Rowwise:
homelong homelat worklong worklat d
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0 2 70 45 8299015.
2 2 5 60 60 7809933.
CodePudding user response:
distm()
returns a distance matrix, which is not what you want; you want the pairwise distances. So use the distance function (distHaversine()
, distGeo()
, or whatever) directly:
library(tidyverse)
locations <- tibble(
homelong = c(0, 2),
homelat = c(2, 5),
worklong = c(70, 60),
worklat = c(45, 60)
)
locations %>%
mutate(
dist = geosphere::distHaversine(cbind(homelong, homelat), cbind(worklong, worklat))
)
#> # A tibble: 2 × 5
#> homelong homelat worklong worklat dist
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 2 70 45 8299015.
#> 2 2 5 60 60 7809933.
Note that geosphere functions want matrices as inputs, so you can cbind()
your columns together. Don't c()
them; that's creating a single shapeless vector and losing the differentiation between lon and lat. This is the cause of the error, I suspect; the vector only has one dimension, not two like a matrix.