Home > front end >  How to filter data with starting and ending conditions?
How to filter data with starting and ending conditions?

Time:08-10

I'm trying to filter my data based on two conditions dependent on sequential dates.

  • I am looking for values below 2 for 5 sequential dates,
  • with a "cushion period" of values 2 to 5 for up to 3 sequential days.

It would look something like this (sorry for the terrible excel attempt here):

enter image description here

Day 1 to Day 10 would be included and day 11 would not be. Days 6 to 8 would be considered the "cushion period." I hope this makes sense!!

Right now, I am able to get the cushion period (in the reprex) only but I cant figure out how to add the start and ending condition for values under 2 for 5 sequential dates to be included (the 5 days could be broken up with the cushion period inbetween but I feel like this might complicate things).

Any help would be GREATLY appreciated!

For my reprex (below), the dates that would be included in the final df are in blue (dates from 1/1/2000 to 1/9/2000, and 1/22/2000 to 1/30/2000) and the dates in grey would not be.

enter image description here

Reprex:

    library("dplyr")
#Goal: include all values with values of 2 or less for 5 consecutive days and allow for a "cushion" period of values of 2 to 5 for up to 3 days
data <- data.frame(Date = c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-06", "2000-01-07", "2000-01-08", "2000-01-09", "2000-01-10", "2000-01-11", "2000-01-12", "2000-01-13", "2000-01-14", "2000-01-15", "2000-01-16", "2000-01-17", "2000-01-18", "2000-01-19", "2000-01-20", "2000-01-21", "2000-01-22", "2000-01-23", "2000-01-24", "2000-01-25", "2000-01-26", "2000-01-27", "2000-01-28", "2000-01-29", "2000-01-30"),
               Value = c(2,3,4,5,2,2,1,0,1,8,7,9,4,5,2,3,4,5,7,2,6,0,2,1,2,0,3,4,0,1))

head(data)                   
#Goal: values should include dates from 1/1/2000 to 1/9/2000, and 1/22/2000 to 1/30/2000

#I am able to subset the "cushion period" but I'm not sure how to add the starting and ending conditions for it
attempt1 <- data %>% 
  group_by(group_id = as.integer(gl(n(),3,n()))) %>% 
  filter(Value <= 5 & Value >=3) %>% 
  ungroup() %>% 
  select(-group_id)

head(attempt1)

CodePudding user response:

If I get it correctly, you need to keep groups of consecutive values that are below or equal to 5 with at least 5 consecutive values below or equal to 2 within it. Here's a way to do that, with some explanation:

library(dplyr)

data %>% 
  mutate(under_three = Value <= 2) %>% 
  # under_three = TRUE if Value is below or equal to 2

  group_by(rl_two = data.table::rleid(Value <= 2)) %>%
  # Group by sequence of values that are under_three

  mutate(big = n() >= 5 & all(under_three)) %>%
  # big = T if there are more 5 or more consecutive values that are below or equal to 2  

  group_by(rl_five = data.table::rleid(Value <= 5)) %>% 
  # ungroup by rl_two, and group by rl_five, i.e. consecutive values that are below or equal to 5

  filter(any(big))
  # keep from the data frame groups of rl_five if they have at least one big = T; remove other groups.

Output:

data %>%
  ungroup() %>% 
  select(Date, Value)

   Date       Value
 1 2000-01-01     2
 2 2000-01-02     3
 3 2000-01-03     4
 4 2000-01-04     5
 5 2000-01-05     2
 6 2000-01-06     2
 7 2000-01-07     1
 8 2000-01-08     0
 9 2000-01-09     1
10 2000-01-22     0
11 2000-01-23     2
12 2000-01-24     1
13 2000-01-25     2
14 2000-01-26     0
15 2000-01-27     3
16 2000-01-28     4
17 2000-01-29     0
18 2000-01-30     1
  • Related