I have a set of accidents that have occurred on the highway. I need to determine the stretch of x kilometers that had the most accidents.
In my example, the fictional highway is 100 kilometers long, and I want to find the "most dangerous" 10-kilometers stretch, with the most accidents.
I came up with the following...
library(tidyverse)
df <- tibble(
id = seq(1, 100,1),
km = round(runif(100, 0, 100))
)
df %>%
mutate(
interval = cut_interval(km, n = 11)
) %>%
count(interval)
The problem is that this method analyzes fixed stretchs. For example, this method does not analyze the number of accidents that occurred on the stretch from kilometer 14 to 23.
How can I find out the 10 km range that had the most accidents?
I guess I can iterate and create the overlapping ranges, and then do the count. Anyway, is there any simpler or more direct way? Any existing packages to work with intervals like this?
CodePudding user response:
Try arranging by km then taking a rolling sum of the past ten as such. You can easily find the maximum rolling sum afterwards, and assume the maximum is the previous ten kilometers.
library(tidyverse)
library(RcppRoll)
set.seed(11)
tibble(
id = seq(1, 100,1),
km = round(runif(100, 0, 100))) %>%
group_by(km) %>%
summarise(accidents = sum(id)) %>%
complete(km = 1:100, fill = list(accidents = 0)) %>%
arrange(km) %>%
mutate(rolling_sum = roll_sum(accidents, 10, fill = NA, align = "right")) %>%
slice(which.max(rolling_sum))
# # A tibble: 1 × 3
# km accidents rolling_sum
# <dbl> <dbl> <dbl>
# 1 25 41 1189
CodePudding user response:
There are 92 stretches of road of length 10km, from 0:9 to 91:100. It's easy to obtain the number of accidents in each of these 10km stretches using sapply
and a simple logical test:
mva <- sapply(0:91, function(x) length(which(df$km > x & df$km < (x 10))))
mva
#> [1] 13 13 11 10 10 11 11 11 13 11 14 13 14 14 11 9 9 8 9 6 6 5 8
#> [24] 10 14 15 16 15 16 16 16 14 12 9 6 6 8 7 8 11 12 14 13 13 12 13
#> [47] 16 16 14 12 10 11 11 11 8 5 5 3 3 3 2 2 2 2 3 2 2 3 3
#> [70] 3 4 5 5 5 5 6 5 5 8 8 9 9 8 8 8 9 10 8 8 6 6 7
To find the most dangerous stretch(es) of road, we can do:
which(mva == max(mva))
#> [1] 27 29 30 31 47 48
Where we can see that there were 16 accidents each in the stretches of road at 27-36km, 29-38km, 30-39km, 31-40km, 47-56km and 48-57km.