I am doing a project on bicycle crashes with motorised vehicles in North Carolina. My initial goal is to be able to identify which streets (and intersections) are most prone to accidents.
Using a 2D-Kernel Density Estimator I have managed to identify the most crash prone county in the state as being Mecklenburg County. Now I am trying to narrow down further to individual streets. This is where I need your help and suggestions!
My question is: How do i join point data with the lines?
I have two datasets that I am trying to join:
- Bike crashes coordinates:
# Read in data from link
bike_crash <- read_csv("https://raw.githubusercontent.com/sdavidsson90/bike_crash/main/raw_data/bike_crash.csv")
# Filter points to area of interest
bike_crash <- bike_crash %>% filter(County == "Mecklenburg")
# Read as Simple Feature Object
bike_crash <- st_as_sf(bike_crash, coords = c(x = "X", y = "Y"), crs = "NAD83")
- And the road network of Mecklenburg County, North Carolina (shape-file downloaded from): https://www2.census.gov/geo/tiger/TIGER2019/ROADS/tl_2019_37119_roads.zip
# Read in shapefile (after downloading and unzipping)
roads <- st_read("......working_directory....../tl_2019_37119_roads/tl_2019_37119_roads.shp")
I have tried the following which yields some results, but the distance threshold doesn't seem to be working properly.
st_intersects(roads, bike_crash)
st_intersects(roads, bike_crash, dist = 10)
Am I even tackling this question right?
Looking forward to your suggestions!
CodePudding user response:
This looks as a fun little project!
To make the answer general - and interesting to future readers: intersection of points and lines is rarely reliable.
Even if it were not for inaccuracies in data entry the simple fact of floating point math will make for many (superficial) misses. The distance may help somewhat, but it may be a good idea to consider other options also.
You have two possibilities, depending on how you structure your problem:
- starting from your roads object you can make a buffer polygon around your roads as lines, and count the number of points (crashes) within that buffer
- starting from your crashes object you can find the nearest road object - each crash site is guaranteed to have a single nearest road, even if "nearest" is a relative term.
For a quick & dirty overview consider this piece of code:
library(sf)
library(dplyr)
# Read in data from link
bike_crash <- readr::read_csv("https://raw.githubusercontent.com/sdavidsson90/bike_crash/main/raw_data/bike_crash.csv")
# Filter points to area of interest
bike_crash <- bike_crash %>%
filter(County == "Mecklenburg")
# Read as Simple Feature Object
bike_crash <- st_as_sf(bike_crash, coords = c(x = "X", y = "Y"), crs = "NAD83")
# roads / via {tigris} to make it cleaner
roads <- tigris::roads(state = "NC",
county = "Mecklenburg")
# find the nearest road / subset fullname by the index of nearest road to crash site
bike_crash$nearest_road <- roads$FULLNAME[st_nearest_feature(bike_crash, roads)]
bike_crash %>%
st_drop_geometry() %>% # no need of geometry in summary stats
group_by(nearest_road) %>%
tally() %>%
slice_max(order_by = n, n = 10)
# A tibble: 11 × 2
# nearest_road n
# <chr> <int>
# 1 S Tryon St 36
# 2 Central Ave 31
# 3 N Tryon St 31
# 4 NA 24
# 5 Beatties Ford Rd 23
# 6 The Plaza 22
# 7 Albemarle Rd 20
# 8 East Blvd 17
# 9 Monroe Rd 17
# 10 State Hwy 51 16
# 11 State Rd 3687 16