I have a large dataframe with one column with time and a second column with speed measurements (km/h). Here is an short example of the database:
df <- data.frame(time = as.POSIXct(c("2019-04-01 13:55:18", "2019-04-01 14:03:18",
"2019-04-01 14:14:18", "2019-04-01 14:26:55",
"2019-04-01 14:46:55", "2019-04-01 15:01:55")),
speed = c(4.5, 6, 3.2, 5, 4, 2))
Is there any way to do a new dataframe, which calculates the distance driven every 20 minutes, from 2019-04-01 14:00:00 to 2019-04-01 15:00:00? assuming that the speed changes are linear. I was trying to find solutions with integrals, but was not sure if it is the correct way to do it. Thanks for the help!
CodePudding user response:
Here is a solution using a combination of zoo::na.approx
and dplyr
functions.
library(zoo)
library(dplyr)
seq = data.frame(time = seq(min(df$time),max(df$time), by = 'secs'))
df <- merge(seq,df,all.x=T)
df$speed <- na.approx(df$speed)
df %>%
filter(time >= "2019-04-01 14:00:00" & time < "2019-04-01 15:00:00") %>%
mutate(km = speed/3600) %>%
group_by(group = cut(time, breaks = "20 min")) %>%
summarise(distance = sum(km))
Which gives:
# A tibble: 3 x 2
group distance
<fct> <dbl>
1 2019-04-01 14:00:00 1.50
2 2019-04-01 14:20:00 1.54
3 2019-04-01 14:40:00 1.16
Explanation:
The first step is to create a sequence of time frames to compute the speed between two times points (seq
). The sequence is then merged with the data frame and NAs are filled using na.approx
.
Then, using dplyr
verbs, the data frame is filtered, and the 20 minutes sequences are created using cut
. The final distance is the sum of every 1-sec distance in the 20 minutes time frame.