Home > Blockchain >  Calculating closest distance between coordinates in two dataframes in R
Calculating closest distance between coordinates in two dataframes in R

Time:05-06

I have two dataframes one for people locations (df1) and the other for housing locations (df2)


df1<-structure(list(Persons = c(1,2,3,4), 
                     Latitude = c(-23.8, -23.8, -23.9, -23.9), 
                     Longitude = c(-49.6, -49.3, -49.4, -49.8),Sex = c("M","F","M","F"), 
                     Age = c(22, 44, 32, 86)), class = "data.frame", row.names = c(NA, -4L))

df2<-structure(list(House = c(1,2,3,4,5,6,7,8), Latitude = c(-23.4, -23.7, -23.4, -23.8,-23.8, -23.9, -23.2, -23.7), 
                     Longitude = c(-49.7, -49.4, -49.6, -49.7,-49.9, -49.7, -49.5, -49.8)), class = "data.frame", row.names = c(NA, -8L))

I simply want to get the which house is closest to each person and get the distance in feet added to that dataframe. I tried distGeo but it did the opposite (which person is closest to each house) and I did not trust binding it back since both dataframes have different observations.

CodePudding user response:

Try

library(sf)
library(nngeo)
df1 <- st_as_sf(df1, coords = c('Longitude', 'Latitude'), crs = 4326)
df2 <- st_as_sf(df2, coords = c('Longitude', 'Latitude'), crs = 4326)
# k = 1 for finding 1 nearest neighbor
dfjoin <- nngeo::st_nn(df1, df2, k = 1, returnDist = T)

This return you list of 2 with the nearest House and the second list containing distance in meter

df1$house <- dfjoin[[1]]
df1$dist <- dfjoin[[2]]

CodePudding user response:

One naive approach would be the following

# create output matrix outside of loops, fill with NA
pairwise_distance <- matrix(NA, nrow(df1), nrow(df2))

# loop through both dfs
for (i in 1:nrow(df1)){
  for (j in 1:nrow(df2)){
  pairwise_distance[i, j] <- geosphere::distGeo(
    c(df1[i, "Latitude"], df1[i,"Longitude"]), 
    c(df2[j, "Latitude"], df2[j,"Longitude"])) 
  }
}
# for each row
# attach the column index of the distance matrix 
# (correspondens to house index in df2) with the 
# minimum value to df1 as new variable `nearest_house_id` 
df1$nearest_house_id <- apply(pairwise_distance, 1, which.min)
# also attach corresponding (minimum) distance
df1$nearest_house_distance <- apply(pairwise_distance, 1, min)
# Hope you want it this way around.

Please consider this answer as first input only. Probably, it needs to be adjusted to missing values or unique matching or sth like that. In addition, it is quite verbose.

  •  Tags:  
  • r
  • Related