I'm trying to find periods of 3 days or more where the values are the same. As an example, if January 1st, 2nd, and 3rd all have a value of 2, they should be included - but if January 2nd has a value of 3, then none of them should be.
I've tried a few ways so far but no luck! Any help would be greatly appreciated!
Reprex:
library("dplyr")
#Goal: include all values with values of 2 or less for 5 consecutive days and allow for a "cushion" period of values of 2 to 5 for up to 3 days
data <- data.frame(Date = c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-06", "2000-01-07", "2000-01-08", "2000-01-09", "2000-01-10", "2000-01-11", "2000-01-12", "2000-01-13", "2000-01-14", "2000-01-15", "2000-01-16", "2000-01-17", "2000-01-18", "2000-01-19", "2000-01-20", "2000-01-21", "2000-01-22", "2000-01-23", "2000-01-24", "2000-01-25", "2000-01-26", "2000-01-27", "2000-01-28", "2000-01-29", "2000-01-30"),
Value = c(2,2,2,5,2,2,1,0,1,8,7,7,7,5,2,3,4,5,7,2,6,6,6,6,2,0,3,4,0,1))
head(data)
#Goal: values should include dates from 2000-01-01 to 2000-01-03, 2000-01-11 to 2000-01-13, and 2000-01-21 to 2000-01-24
#My attempt so far but it doesn't work
attempt1 <- data %>%
group_by(group_id = as.integer(gl(n(),3,n()))) %>% #3 day chunks
filter(Value == Value) %>% #looking for the values being the same inbetween, but this doesn't work for that
ungroup() %>%
select(-group_id)
head(attempt1)
CodePudding user response:
With rle
:
rl <- rle(data$Value)
data[rep(rl$lengths>=3,rl$lengths),]
Date Value
1 2000-01-01 2
2 2000-01-02 2
3 2000-01-03 2
11 2000-01-11 7
12 2000-01-12 7
13 2000-01-13 7
21 2000-01-21 6
22 2000-01-22 6
23 2000-01-23 6
24 2000-01-24 6
or with dplyr
:
library(dplyr)
data %>% filter(rep(rle(Value)$length>=3,rle(Value)$length))
Date Value
1 2000-01-01 2
2 2000-01-02 2
3 2000-01-03 2
4 2000-01-11 7
5 2000-01-12 7
6 2000-01-13 7
7 2000-01-21 6
8 2000-01-22 6
9 2000-01-23 6
10 2000-01-24 6
CodePudding user response:
You can create a temporary variable using rleid
from the data.table
package.
data %>%
group_by(data.table::rleid(Value)) %>%
filter(n() >= 3) %>%
ungroup() %>%
select(Date, Value)
#> # A tibble: 10 x 2
#> Date Value
#> <chr> <dbl>
#> 1 2000-01-01 2
#> 2 2000-01-02 2
#> 3 2000-01-03 2
#> 4 2000-01-11 7
#> 5 2000-01-12 7
#> 6 2000-01-13 7
#> 7 2000-01-21 6
#> 8 2000-01-22 6
#> 9 2000-01-23 6
#> 10 2000-01-24 6
Or, if you want to avoid using another package, you could equivalently do
data %>%
group_by(temp = cumsum(c(1, diff(Value) != 0))) %>%
filter(n() > 2) %>%
ungroup() %>%
select(-temp)
#> # A tibble: 10 x 2
#> Date Value
#> <chr> <dbl>
#> 1 2000-01-01 2
#> 2 2000-01-02 2
#> 3 2000-01-03 2
#> 4 2000-01-11 7
#> 5 2000-01-12 7
#> 6 2000-01-13 7
#> 7 2000-01-21 6
#> 8 2000-01-22 6
#> 9 2000-01-23 6
#> 10 2000-01-24 6
Created on 2022-09-12 with reprex v2.0.2