I have dataframe like this:
day device delta
2021-01-01 A 1.3
2021-01-02 A 2.56
2021-01-03 A -1.72
2021-01-04 A 4.09
2021-01-05 A 6.99
2021-01-06 A 3.23
2021-01-07 A 1.12
2021-01-08 A 4.56
2021-01-09 A 1.09
2021-01-10 A 2.14
2021-01-01 B -0.76
2021-01-02 B -1.78
2021-01-03 B 1.54
2021-01-04 B 1.93
2021-01-05 B 5.98
2021-01-06 B 2.49
2021-01-07 B 8.92
2021-01-08 B 8.08
2021-01-09 B 3.12
2021-01-10 B 1.88
I would like to know whether there is an efficient way of finding out how many 7 days (rows) windows for any of the devices include an increase in the delta value from the first row in the window by more than 5 points?
I would also need to list the first delta value for each window and the first increased value which passes the 5 point threshold.
The desired result would look like this:
window_start window_end device delta_first delta_first_threshold_pass
2021-01-01 2021-01-07 A 1.3 6.99
2021-01-03 2021-01-09 A -1.72 6.99
2021-01-01 2021-01-07 B -0.76 5.98
2021-01-02 2021-01-08 B -1.78 5.98
2021-01-03 2021-01-09 B 1.54 8.92
2021-01-04 2021-01-10 B 1.93 8.92
Would something like this be possible? Thank you for any suggestions!
CodePudding user response:
Perhaps you can use rollapplyr
from zoo
package?
After grouping by device
, you can determine the max
delta for a 5 day/row window. With the align
of "left" the index is left-aligned (includes subsequent days after index date).
Then, you can filter
rows where the difference between max
value and delta
is greater than your threshold of 5.
library(zoo)
library(tidyverse)
df %>%
group_by(device) %>%
mutate(max = rollapplyr(delta, width = 5, FUN = max, align = "left", partial = T)) %>%
filter(max - delta > 5)
Output
day device delta max
<chr> <chr> <dbl> <dbl>
1 2021-01-01 A 1.3 6.99
2 2021-01-03 A -1.72 6.99
3 2021-01-01 B -0.76 5.98
4 2021-01-02 B -1.78 5.98
5 2021-01-03 B 1.54 8.92
6 2021-01-04 B 1.93 8.92
7 2021-01-06 B 2.49 8.92