Home > Software design >  Time series, find how much value in one column decreases after the value in another column every day
Time series, find how much value in one column decreases after the value in another column every day

Time:08-18

I have half-hourly time series with two variables value and state. An example day would look like this:

date                value state
2017-01-10 00:00    17    0
2017-01-10 00:30    17    0
2017-01-10 01:00    16    0
2017-01-10 01:30    16    0
2017-01-10 02:00    16    0
2017-01-10 00:00    16    0
...                 ...   ...
2017-01-10 21:00    23    1
2017-01-10 21:30    23    1
2017-01-10 22:00    20    0
2017-01-10 22:30    18    0
2017-01-10 23:00    17    0
2017-01-10 23:30    16    0

And this goes for many days.

I need to find how much the value decreases in the space of 3 hours after the last state=1 of the day, for each day.

Thank you!

CodePudding user response:

Sample data that covers multiple days:

set.seed(42)
dat <- data.frame(timestamp = seq(as.POSIXct("2017-01-10 00:00:00"), length.out = 48*4, by = "30 mins"))
dat$value <- sample(15:30, size = nrow(dat), replace = TRUE)
dat$state <- rbinom(nrow(dat), 1, 0.1)
head(dat)
#             timestamp value state
# 1 2017-01-10 00:00:00    15     0
# 2 2017-01-10 00:30:00    19     0
# 3 2017-01-10 01:00:00    15     0
# 4 2017-01-10 01:30:00    23     0
# 5 2017-01-10 02:00:00    24     0
# 6 2017-01-10 02:30:00    18     0

Try this:

library(dplyr)
dat %>%
  group_by(grp = cumsum(state == 1)) %>%
  filter(first(state) == 1, timestamp <= (first(timestamp)   3600*3)) %>%
  summarize(
    timestamp_1 = first(timestamp), timestamp_n = last(timestamp),
    value_1 = first(value), value_n = last(value),
    value_max = max(value), value_min = min(value),
    dip = max(first(value) - value)
  ) %>%
  group_by(date = substring(timestamp_1, 1, 10)) %>%
  slice_tail(n = 1) %>%
  ungroup()
# # A tibble: 4 x 9
#     grp timestamp_1         timestamp_n         value_1 value_n value_max value_min   dip date      
#   <int> <dttm>              <dttm>                <int>   <int>     <int>     <int> <int> <chr>     
# 1     3 2017-01-10 21:00:00 2017-01-11 00:00:00      20      16        24        16     4 2017-01-10
# 2     7 2017-01-11 21:30:00 2017-01-12 00:30:00      30      26        30        19    11 2017-01-11
# 3    14 2017-01-12 21:00:00 2017-01-13 00:00:00      21      23        28        15     6 2017-01-12
# 4    20 2017-01-13 19:30:00 2017-01-13 22:30:00      29      24        29        16    13 2017-01-13

I wasn't certain if you meant "value[time 3h] - value[1]" or if you meant the max of "value - value[1]" for each group; the current dip is the max dip, not necessarily value - value[1]. To cover and demonstrate this, I've included the first/last timestamp (for proof of 3h window) and first/last/min/max value for each group.

It calculates these stats for all groups during each day, not just the last group of the day; if we group by day first before summarizing, we will lose those cases where the 3h window goes over midnight.

  • Related