Home > Net >  Find number of close stores, using a matrix of distances
Find number of close stores, using a matrix of distances

Time:11-28

I have a matrix of stores and their relative distance to each other similar to this table with 3000 rows and 3000 columns. I want to create a new table that shows the number of stores that are 5km or closer to each store.

store 1
store 1 NA
store 2 200 (M)

I have created this distance matrix using a data frame, which has the x coordinates and y coordinates of each store.

dist_matrix <- st_distance(df)
diag(dist_matrix) <- NA

This gives me a matrix containing store distance to the next one in meters.

I want to find the number of stores that are located in a 5km radius to each store. I have tried to do this: `


#making the matrix into a data frame
dist_matrix <- data.frame(dist_matrix)
names(dist_matrix) <- df$store_id
rownames(dist_matrix) <- df$store_id

close <- dist_matrix %>%
mutate(ID=rownames(.)) %>%
gather('closest','dist',-ID) %>%
filter(!is.na(dist)) %>%
arrange(dist)

But it does not seem to work. Does anyone have input on how to solve this?

CodePudding user response:

I believe you could find useful to use the base function colSums(), paired with a logical matrix of distances less than whatever threshold you need.

As I don't have access to your data I will use distance matrix of my three favorite North Carolina towns (because of the nc.shp that lives in {sf}).

What this example does is:

  • it calculates a distance matrix of the three cities, dropping units dimension to ease calculation (no need to remember that we are talking meters)
  • gives the distance matrix names to make it easier to work with
  • creates a logical matrix logi of distances less than 150000 meters
  • calculates column sums over the matrix of distances less than 150000 meters.

Note that each city is automatically included (having zero distance - it is on diagonal of the original matrix). So you will need to subtract one. Or, as you you did in your original code, set diagonal to NA and do the colSums() with na.rm = TRUE / it should not matter which one you do, as long as you do one of the two.

library(sf)
library(dplyr)

cities <- data.frame(name = c("Raleigh", "Greensboro", "Wilmington"),
                     x = c(-78.633333, -79.819444, -77.912222),
                     y = c(35.766667, 36.08, 34.223333)) %>% 
  st_as_sf(coords = c("x", "y"), crs = 4326)


result <- cities %>% 
  st_distance() %>% 
  units::drop_units()

colnames(result) <- cities$name

logi <- result < 150000 # here will be your 500 meters

colSums(logi)
# Raleigh Greensboro Wilmington 
#       2          2          1 

CodePudding user response:

Does this work?

dist_matrix |>
  filter_all(all_vars(.<=5000)) #assuming 5000m
  • Related