Home > Mobile >  Rolling windows with passing threshold in R
Rolling windows with passing threshold in R

Time:11-24

I have dataframe like this:

day         device      delta
2021-01-01  A           1.3
2021-01-02  A           2.56
2021-01-03  A           -1.72
2021-01-04  A           4.09
2021-01-05  A           6.99
2021-01-06  A           3.23
2021-01-07  A           1.12
2021-01-08  A           4.56
2021-01-09  A           1.09
2021-01-10  A           2.14
2021-01-01  B           -0.76
2021-01-02  B           -1.78
2021-01-03  B           1.54
2021-01-04  B           1.93
2021-01-05  B           5.98
2021-01-06  B           2.49
2021-01-07  B           8.92
2021-01-08  B           8.08
2021-01-09  B           3.12
2021-01-10  B           1.88

I would like to know whether there is an efficient way of finding out how many 7 days (rows) windows for any of the devices include an increase in the delta value from the first row in the window by more than 5 points?

I would also need to list the first delta value for each window and the first increased value which passes the 5 point threshold.

The desired result would look like this:

window_start    window_end    device    delta_first    delta_first_threshold_pass   
2021-01-01      2021-01-07    A         1.3            6.99
2021-01-03      2021-01-09    A         -1.72          6.99
2021-01-01      2021-01-07    B         -0.76          5.98
2021-01-02      2021-01-08    B         -1.78          5.98
2021-01-03      2021-01-09    B         1.54           8.92
2021-01-04      2021-01-10    B         1.93           8.92

Would something like this be possible? Thank you for any suggestions!

CodePudding user response:

Perhaps you can use rollapplyr from zoo package?

After grouping by device, you can determine the max delta for a 5 day/row window. With the align of "left" the index is left-aligned (includes subsequent days after index date).

Then, you can filter rows where the difference between max value and delta is greater than your threshold of 5.

library(zoo)
library(tidyverse)

df %>%
  group_by(device) %>%
  mutate(max = rollapplyr(delta, width = 5, FUN = max, align = "left", partial = T)) %>%
  filter(max - delta > 5)

Output

  day        device delta   max
  <chr>      <chr>  <dbl> <dbl>
1 2021-01-01 A       1.3   6.99
2 2021-01-03 A      -1.72  6.99
3 2021-01-01 B      -0.76  5.98
4 2021-01-02 B      -1.78  5.98
5 2021-01-03 B       1.54  8.92
6 2021-01-04 B       1.93  8.92
7 2021-01-06 B       2.49  8.92
  • Related