Home > database >  R - Find shortest distance between points across two dataframes
R - Find shortest distance between points across two dataframes

Time:03-04

I need to identify the shortest distance between points, across two dataframes.

Dataframe biz contains individual businesses, including their coordinates:

biz <- structure(list(name = c("bizA", "bizB", "bizC", "bizD"), 
lon = c(-3.276435,-4.175388,-4.181740,-3.821941), 
lat = c(11.96748,12.19885,13.04638,11.84277)),
class = "data.frame",row.names = c(NA, -4L))

biz
  name       lon      lat
1 bizA -3.276435 11.96748
2 bizB -4.175388 12.19885
3 bizC -4.181740 13.04638
4 bizD -3.821941 11.84277

Dataframe city contains market cities, including their geocoordinates:

city <- structure(list(name = c("cityA", "cityB", "cityC", "cityD"), 
lon = c(-4.7588042,-3.2432781,-3.0626284,-2.3566861), 
lat = c(10.64002,10.95790,13.06950,13.20363)),
class = "data.frame",row.names = c(NA, -4L))

city
   name       lon      lat
1 cityA -4.758804 10.64002
2 cityB -3.243278 10.95790
3 cityC -3.062628 13.06950
4 cityD -2.356686 13.20363

For each business in biz, I need to identify which market city is closest, and list the name of that market city in a new column:

biz
  name       lon      lat     city
1 bizA -3.276435 11.96748
2 bizB -4.175388 12.19885
3 bizC -4.181740 13.04638
4 bizD -3.821941 11.84277

I know that I can use packages like geosphere to measure the distance between bizA and cityA coordinates. I'm struggling with: how to compare bizA to each city, minimize the distance, and then list that closest city in dataframe biz.

Any thoughts are much appreciated!

CodePudding user response:

You can use st_nearest_feature from sf:

cbind(
  biz,
  nearest_city = city[
    st_nearest_feature(
      st_as_sf(biz, coords = c("lon", "lat"), crs = 4326), 
      st_as_sf(city, coords = c("lon", "lat"), crs = 4326)
    ),
  ]$name
)

although coordinates are longitude/latitude, st_nearest_feature assumes that they are planar
  name       lon      lat nearest_city
1 bizA -3.276435 11.96748     cityB
2 bizB -4.175388 12.19885     cityC
3 bizC -4.181740 13.04638     cityC
4 bizD -3.821941 11.84277     cityB

CodePudding user response:

I guess there are multiple ways to do this. Here is one, that starts by creating all combinations of rows from the two data frames using the dfcombos function from here. (I think there are some alternatives in packages on CRAN.)

Here the distance is just a random number, to demonstrate.

The closest cities are selected using duplicated after sorting with order. There are alternatives to this approach as well, but it seemed simple.

source('dfcombos.R')

biz <- structure(list(name = c("bizA", "bizB", "bizC", "bizD"), 
lon = c(-3.276435,-4.175388,-4.181740,-3.821941), 
lat = c(11.96748,12.19885,13.04638,11.84277)),
class = "data.frame",row.names = c(NA, -4L))

city <- structure(list(name = c("cityA", "cityB", "cityC", "cityD"), 
lon = c(-4.7588042,-3.2432781,-3.0626284,-2.3566861), 
lat = c(10.64002,10.95790,13.06950,13.20363)),
class = "data.frame",row.names = c(NA, -4L))

comb <- dfcombos(biz, city)

comb$dist <- runif(nrow(comb))

comb <- comb[order(comb$dist), ]

closest <- comb[!duplicated(comb$name), ]
  • Related