Home > other >  Calculate time delay between time series
Calculate time delay between time series

Time:12-25

I have two time series denoting cumulative counts at the entrance and exit of an area. The vertical distance between the series can be interpreted as the instantaneous occupancy of the area, and the horizontal distance is the average "dwell" time. The real data is at 1-minute resolution, with data in each minute with observations. (Sample data at bottom.)

What's a simple, readable, efficient way to calculate the horizontal time delay between the two series across the day? e.g at each point in series 2105, how many minutes to the (first) linearly-interpolated point on series 2081 with the same y value?

# using df1 defined at bottom
ggplot(df1, aes(datetime, cuml, color = label))   
  geom_line()     
  geom_point()

enter image description here

One approach I thought about was to lerp each series using dplyr::complete zoo::na.approx, then non-equi join each series to the others for the closest match, but that seems inefficient and unwieldy for my use case of ~20 sensors, 200 days @ 1-minute resolution. For example:

library(dplyr); library(zoo)
df1 %>%
  complete(label, datetime = seq.POSIXt(
    min(df1$datetime), max(df1$datetime), by = "min")) %>%
  group_by(label) %>%
  mutate(cuml2 = na.spline(cuml, datetime)) %>%
  ungroup() -> df2

df2 %>% filter(label == "2105 Line 0-exit") %>%
  left_join(df2 %>% filter(label != "2105 Line 0-exit"), by = character()) %>%
  mutate(dif = cuml2.x - cuml2.y) %>%
  group_by(datetime.x) %>%
  slice_min(abs(dif), n = 1) %>%
  ungroup() %>%
  mutate(time_dif = datetime.y - datetime.x) 

Example data

df1 <- structure(list(datetime = structure(c(1670152500, 1670152500, 
  1670154300, 1670156100, 1670156100, 1670157900, 1670157900, 1670159700, 
  1670159700, 1670161500, 1670161500, 1670163300, 1670163300, 1670165100, 
  1670165100, 1670166900, 1670166900, 1670168700, 1670168700, 1670170500, 
  1670170500, 1670172300, 1670172300, 1670174100, 1670174100, 1670175900, 
  1670175900), tzone = "UTC", class = c("POSIXct", "POSIXt")), 
    label = c("2081 Line 0-exit", "2105 Line 0-exit", "2105 Line 0-exit", 
    "2081 Line 0-exit", "2105 Line 0-exit", "2081 Line 0-exit", 
    "2105 Line 0-exit", "2105 Line 0-exit", "2081 Line 0-exit", 
    "2105 Line 0-exit", "2081 Line 0-exit", "2081 Line 0-exit", 
    "2105 Line 0-exit", "2105 Line 0-exit", "2081 Line 0-exit", 
    "2081 Line 0-exit", "2105 Line 0-exit", "2105 Line 0-exit", 
    "2081 Line 0-exit", "2105 Line 0-exit", "2081 Line 0-exit", 
    "2105 Line 0-exit", "2081 Line 0-exit", "2105 Line 0-exit", 
    "2081 Line 0-exit", "2081 Line 0-exit", "2105 Line 0-exit"
    ), cuml = c(8.30121553513193, 96.9773299748111, 244.892247411139, 
    213.756300029647, 418.275958578226, 420.249036466054, 636.719843268962, 
    883.57122865939, 637.118292321376, 1137.27959697733, 891.343018084791, 
    1178.77260598873, 1388.04925832634, 1725.02099076406, 1407.05603320486, 
    1710.05040023718, 2025.74867058494, 2349.00643716765, 2043.13667358435, 
    2668.34592779177, 2346.13104061666, 2935.76826196474, 2649.12540764898, 
    3198.29275118948, 2988.43759264749, 3285.20604802846, 3421.63448082844
    )), row.names = c(NA, -27L), class = c("tbl_df", "tbl", "data.frame"
))

CodePudding user response:

Read the data into a zoo object z splitting on the label column so that each series gets it own column. That is, z now has 2 columns, one for each series. Now use approxfun to get the time of the second column as a function of the second column value. Look that up using the first column value and take the difference with the first column's time. Average the differences. The result is in seconds. You may wish to reverse the terms of the difference depending on the direction of the result desired.

library(zoo)

z <- read.zoo(df1, split = "label")
mean(as.numeric(time(z)) - approxfun(z[, 2], time(z))(z[, 1]), na.rm = TRUE)
## [1] 1712.595
  •  Tags:  
  • r
  • Related