Home > Software engineering >  Calculating distances in a datraframe
Calculating distances in a datraframe


I have a large dataframe with one column with time and a second column with speed measurements (km/h). Here is an short example of the database:

df <- data.frame(time = as.POSIXct(c("2019-04-01 13:55:18", "2019-04-01 14:03:18",
                                     "2019-04-01 14:14:18", "2019-04-01 14:26:55",
                                     "2019-04-01 14:46:55", "2019-04-01 15:01:55")),
                 speed = c(4.5, 6, 3.2, 5, 4, 2))

Is there any way to do a new dataframe, which calculates the distance driven every 20 minutes, from 2019-04-01 14:00:00 to 2019-04-01 15:00:00? assuming that the speed changes are linear. I was trying to find solutions with integrals, but was not sure if it is the correct way to do it. Thanks for the help!

CodePudding user response:

Here is a solution using a combination of zoo::na.approx and dplyr functions.


seq = data.frame(time = seq(min(df$time),max(df$time), by = 'secs'))
df <- merge(seq,df,all.x=T)
df$speed <- na.approx(df$speed)

df %>%
  filter(time >= "2019-04-01 14:00:00" & time < "2019-04-01 15:00:00") %>% 
  mutate(km = speed/3600) %>% 
  group_by(group = cut(time, breaks = "20 min")) %>% 
  summarise(distance = sum(km))

Which gives:

# A tibble: 3 x 2
  group               distance
  <fct>                  <dbl>
1 2019-04-01 14:00:00     1.50
2 2019-04-01 14:20:00     1.54
3 2019-04-01 14:40:00     1.16

Explanation: The first step is to create a sequence of time frames to compute the speed between two times points (seq). The sequence is then merged with the data frame and NAs are filled using na.approx.

Then, using dplyr verbs, the data frame is filtered, and the 20 minutes sequences are created using cut. The final distance is the sum of every 1-sec distance in the 20 minutes time frame.

  • Related