Home > Software engineering >  Joining points and lines using SF
Joining points and lines using SF

Time:11-30

I am doing a project on bicycle crashes with motorised vehicles in North Carolina. My initial goal is to be able to identify which streets (and intersections) are most prone to accidents.

Using a 2D-Kernel Density Estimator I have managed to identify the most crash prone county in the state as being Mecklenburg County. Now I am trying to narrow down further to individual streets. This is where I need your help and suggestions!

My question is: How do i join point data with the lines?

I have two datasets that I am trying to join:

  1. Bike crashes coordinates:
# Read in data from link
bike_crash <- read_csv("https://raw.githubusercontent.com/sdavidsson90/bike_crash/main/raw_data/bike_crash.csv")

# Filter points to area of interest
bike_crash <- bike_crash %>% filter(County == "Mecklenburg")

# Read as Simple Feature Object
bike_crash <- st_as_sf(bike_crash, coords = c(x = "X", y = "Y"), crs = "NAD83")

  1. And the road network of Mecklenburg County, North Carolina (shape-file downloaded from): https://www2.census.gov/geo/tiger/TIGER2019/ROADS/tl_2019_37119_roads.zip
# Read in shapefile (after downloading and unzipping)
roads <- st_read("......working_directory....../tl_2019_37119_roads/tl_2019_37119_roads.shp")

I have tried the following which yields some results, but the distance threshold doesn't seem to be working properly.

st_intersects(roads, bike_crash)

st_intersects(roads, bike_crash, dist = 10)

Am I even tackling this question right?

Looking forward to your suggestions!

CodePudding user response:

This looks as a fun little project!

To make the answer general - and interesting to future readers: intersection of points and lines is rarely reliable.

Even if it were not for inaccuracies in data entry the simple fact of floating point math will make for many (superficial) misses. The distance may help somewhat, but it may be a good idea to consider other options also.

You have two possibilities, depending on how you structure your problem:

  • starting from your roads object you can make a buffer polygon around your roads as lines, and count the number of points (crashes) within that buffer
  • starting from your crashes object you can find the nearest road object - each crash site is guaranteed to have a single nearest road, even if "nearest" is a relative term.

For a quick & dirty overview consider this piece of code:

library(sf)
library(dplyr)

# Read in data from link
bike_crash <- readr::read_csv("https://raw.githubusercontent.com/sdavidsson90/bike_crash/main/raw_data/bike_crash.csv")

# Filter points to area of interest
bike_crash <- bike_crash %>% 
  filter(County == "Mecklenburg")

# Read as Simple Feature Object
bike_crash <- st_as_sf(bike_crash, coords = c(x = "X", y = "Y"), crs = "NAD83")

# roads / via {tigris} to make it cleaner
roads <- tigris::roads(state = "NC",
                       county = "Mecklenburg")

# find the nearest road / subset fullname by the index of nearest road to crash site
bike_crash$nearest_road <- roads$FULLNAME[st_nearest_feature(bike_crash, roads)]

bike_crash %>% 
  st_drop_geometry() %>%  # no need of geometry in summary stats
  group_by(nearest_road) %>% 
  tally() %>% 
  slice_max(order_by = n, n = 10)
# A tibble: 11 × 2
#    nearest_road         n
#    <chr>            <int>
#  1 S Tryon St          36
#  2 Central Ave         31
#  3 N Tryon St          31
#  4 NA                  24
#  5 Beatties Ford Rd    23
#  6 The Plaza           22
#  7 Albemarle Rd        20
#  8 East Blvd           17
#  9 Monroe Rd           17
# 10 State Hwy 51        16
# 11 State Rd 3687       16
  •  Tags:  
  • r sf
  • Related