I need to identify the shortest distance between points, across two dataframes.
Dataframe biz
contains individual businesses, including their coordinates:
biz <- structure(list(name = c("bizA", "bizB", "bizC", "bizD"),
lon = c(-3.276435,-4.175388,-4.181740,-3.821941),
lat = c(11.96748,12.19885,13.04638,11.84277)),
class = "data.frame",row.names = c(NA, -4L))
biz
name lon lat
1 bizA -3.276435 11.96748
2 bizB -4.175388 12.19885
3 bizC -4.181740 13.04638
4 bizD -3.821941 11.84277
Dataframe city
contains market cities, including their geocoordinates:
city <- structure(list(name = c("cityA", "cityB", "cityC", "cityD"),
lon = c(-4.7588042,-3.2432781,-3.0626284,-2.3566861),
lat = c(10.64002,10.95790,13.06950,13.20363)),
class = "data.frame",row.names = c(NA, -4L))
city
name lon lat
1 cityA -4.758804 10.64002
2 cityB -3.243278 10.95790
3 cityC -3.062628 13.06950
4 cityD -2.356686 13.20363
For each business in biz
, I need to identify which market city is closest, and list the name of that market city in a new column:
biz
name lon lat city
1 bizA -3.276435 11.96748
2 bizB -4.175388 12.19885
3 bizC -4.181740 13.04638
4 bizD -3.821941 11.84277
I know that I can use packages like geosphere
to measure the distance between bizA
and cityA
coordinates. I'm struggling with: how to compare bizA
to each city, minimize the distance, and then list that closest city in dataframe biz
.
Any thoughts are much appreciated!
CodePudding user response:
You can use st_nearest_feature
from sf
:
cbind(
biz,
nearest_city = city[
st_nearest_feature(
st_as_sf(biz, coords = c("lon", "lat"), crs = 4326),
st_as_sf(city, coords = c("lon", "lat"), crs = 4326)
),
]$name
)
although coordinates are longitude/latitude, st_nearest_feature assumes that they are planar
name lon lat nearest_city
1 bizA -3.276435 11.96748 cityB
2 bizB -4.175388 12.19885 cityC
3 bizC -4.181740 13.04638 cityC
4 bizD -3.821941 11.84277 cityB
CodePudding user response:
I guess there are multiple ways to do this.
Here is one, that starts by creating all combinations of rows from the two data frames using the dfcombos
function from here.
(I think there are some alternatives in packages on CRAN.)
Here the distance is just a random number, to demonstrate.
The closest cities are selected using duplicated
after sorting with order
.
There are alternatives to this approach as well, but it seemed simple.
source('dfcombos.R')
biz <- structure(list(name = c("bizA", "bizB", "bizC", "bizD"),
lon = c(-3.276435,-4.175388,-4.181740,-3.821941),
lat = c(11.96748,12.19885,13.04638,11.84277)),
class = "data.frame",row.names = c(NA, -4L))
city <- structure(list(name = c("cityA", "cityB", "cityC", "cityD"),
lon = c(-4.7588042,-3.2432781,-3.0626284,-2.3566861),
lat = c(10.64002,10.95790,13.06950,13.20363)),
class = "data.frame",row.names = c(NA, -4L))
comb <- dfcombos(biz, city)
comb$dist <- runif(nrow(comb))
comb <- comb[order(comb$dist), ]
closest <- comb[!duplicated(comb$name), ]