I have half-hourly time series with two variables value and state. An example day would look like this:
date value state
2017-01-10 00:00 17 0
2017-01-10 00:30 17 0
2017-01-10 01:00 16 0
2017-01-10 01:30 16 0
2017-01-10 02:00 16 0
2017-01-10 00:00 16 0
... ... ...
2017-01-10 21:00 23 1
2017-01-10 21:30 23 1
2017-01-10 22:00 20 0
2017-01-10 22:30 18 0
2017-01-10 23:00 17 0
2017-01-10 23:30 16 0
And this goes for many days.
I need to find how much the value decreases in the space of 3 hours after the last state=1 of the day, for each day.
Thank you!
CodePudding user response:
Sample data that covers multiple days:
set.seed(42)
dat <- data.frame(timestamp = seq(as.POSIXct("2017-01-10 00:00:00"), length.out = 48*4, by = "30 mins"))
dat$value <- sample(15:30, size = nrow(dat), replace = TRUE)
dat$state <- rbinom(nrow(dat), 1, 0.1)
head(dat)
# timestamp value state
# 1 2017-01-10 00:00:00 15 0
# 2 2017-01-10 00:30:00 19 0
# 3 2017-01-10 01:00:00 15 0
# 4 2017-01-10 01:30:00 23 0
# 5 2017-01-10 02:00:00 24 0
# 6 2017-01-10 02:30:00 18 0
Try this:
library(dplyr)
dat %>%
group_by(grp = cumsum(state == 1)) %>%
filter(first(state) == 1, timestamp <= (first(timestamp) 3600*3)) %>%
summarize(
timestamp_1 = first(timestamp), timestamp_n = last(timestamp),
value_1 = first(value), value_n = last(value),
value_max = max(value), value_min = min(value),
dip = max(first(value) - value)
) %>%
group_by(date = substring(timestamp_1, 1, 10)) %>%
slice_tail(n = 1) %>%
ungroup()
# # A tibble: 4 x 9
# grp timestamp_1 timestamp_n value_1 value_n value_max value_min dip date
# <int> <dttm> <dttm> <int> <int> <int> <int> <int> <chr>
# 1 3 2017-01-10 21:00:00 2017-01-11 00:00:00 20 16 24 16 4 2017-01-10
# 2 7 2017-01-11 21:30:00 2017-01-12 00:30:00 30 26 30 19 11 2017-01-11
# 3 14 2017-01-12 21:00:00 2017-01-13 00:00:00 21 23 28 15 6 2017-01-12
# 4 20 2017-01-13 19:30:00 2017-01-13 22:30:00 29 24 29 16 13 2017-01-13
I wasn't certain if you meant "value[time 3h] - value[1]" or if you meant the max of "value - value[1]" for each group; the current dip
is the max dip, not necessarily value - value[1]
. To cover and demonstrate this, I've included the first/last timestamp (for proof of 3h window) and first/last/min/max value for each group.
It calculates these stats for all groups during each day, not just the last group of the day; if we group by day first before summarizing, we will lose those cases where the 3h window goes over midnight.