Home > Back-end >  How do I find three or more consecutive dates with the same value in R?
How do I find three or more consecutive dates with the same value in R?

Time:09-13

I'm trying to find periods of 3 days or more where the values are the same. As an example, if January 1st, 2nd, and 3rd all have a value of 2, they should be included - but if January 2nd has a value of 3, then none of them should be.

I've tried a few ways so far but no luck! Any help would be greatly appreciated!

Reprex:

library("dplyr")
#Goal: include all values with values of 2 or less for 5 consecutive days and allow for a "cushion" period of values of 2 to 5 for up to 3 days
data <- data.frame(Date = c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-06", "2000-01-07", "2000-01-08", "2000-01-09", "2000-01-10", "2000-01-11", "2000-01-12", "2000-01-13", "2000-01-14", "2000-01-15", "2000-01-16", "2000-01-17", "2000-01-18", "2000-01-19", "2000-01-20", "2000-01-21", "2000-01-22", "2000-01-23", "2000-01-24", "2000-01-25", "2000-01-26", "2000-01-27", "2000-01-28", "2000-01-29", "2000-01-30"),
                   Value = c(2,2,2,5,2,2,1,0,1,8,7,7,7,5,2,3,4,5,7,2,6,6,6,6,2,0,3,4,0,1))

head(data)                   
#Goal: values should include dates from 2000-01-01 to 2000-01-03, 2000-01-11 to 2000-01-13, and 2000-01-21 to 2000-01-24

#My attempt so far but it doesn't work 
attempt1 <- data %>% 
  group_by(group_id = as.integer(gl(n(),3,n()))) %>% #3 day chunks
  filter(Value == Value) %>%  #looking for the values being the same inbetween, but this doesn't work for that
  ungroup() %>% 
  select(-group_id)

head(attempt1)

CodePudding user response:

With rle:

rl <- rle(data$Value)
data[rep(rl$lengths>=3,rl$lengths),]

         Date Value
1  2000-01-01     2
2  2000-01-02     2
3  2000-01-03     2
11 2000-01-11     7
12 2000-01-12     7
13 2000-01-13     7
21 2000-01-21     6
22 2000-01-22     6
23 2000-01-23     6
24 2000-01-24     6

or with dplyr:

library(dplyr)
data %>% filter(rep(rle(Value)$length>=3,rle(Value)$length))

         Date Value
1  2000-01-01     2
2  2000-01-02     2
3  2000-01-03     2
4  2000-01-11     7
5  2000-01-12     7
6  2000-01-13     7
7  2000-01-21     6
8  2000-01-22     6
9  2000-01-23     6
10 2000-01-24     6

CodePudding user response:

You can create a temporary variable using rleid from the data.table package.

data %>%
  group_by(data.table::rleid(Value)) %>%
  filter(n() >= 3) %>%
  ungroup() %>%
  select(Date, Value)
#> # A tibble: 10 x 2
#>    Date       Value
#>    <chr>      <dbl>
#>  1 2000-01-01     2
#>  2 2000-01-02     2
#>  3 2000-01-03     2
#>  4 2000-01-11     7
#>  5 2000-01-12     7
#>  6 2000-01-13     7
#>  7 2000-01-21     6
#>  8 2000-01-22     6
#>  9 2000-01-23     6
#> 10 2000-01-24     6

Or, if you want to avoid using another package, you could equivalently do

data %>%
  group_by(temp = cumsum(c(1, diff(Value) != 0))) %>%
  filter(n() > 2) %>%
  ungroup() %>%
  select(-temp)
#> # A tibble: 10 x 2
#>    Date       Value
#>    <chr>      <dbl>
#>  1 2000-01-01     2
#>  2 2000-01-02     2
#>  3 2000-01-03     2
#>  4 2000-01-11     7
#>  5 2000-01-12     7
#>  6 2000-01-13     7
#>  7 2000-01-21     6
#>  8 2000-01-22     6
#>  9 2000-01-23     6
#> 10 2000-01-24     6

Created on 2022-09-12 with reprex v2.0.2

  • Related