Home > database >  Filter after two consecutive values are observed by group
Filter after two consecutive values are observed by group

Time:10-01

I want to filter group id after two consecutive "yes" values observed only.

DT<-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,3,3,3,3),
type=c("yes","yes","no","no","yes","yes","no","no","yes","no","yes","yes","no","yes"))

Desired output:

   id type
   1  yes
   1  yes
   1   no
   2  yes
   2  yes
   2   no
   3  yes
   3  yes
   3   no
   3  yes

I tried it as:

DT<-DT %>% group_by(id) %>% 
  mutate(Tlag = lag(type), ident = type == 'yes' & Tlag == 'yes' ) %>%
  filter(any(ident)) %>% filter(type == 'yes' | (cumsum(type == 'yes') == 1)) %>% select(-ident)

CodePudding user response:

Using dplyr and kind of an indicator:

DT %>% 
  group_by(id) %>% 
  mutate(indic = ifelse(lead(type) == "yes" & type == "yes", 1, NA)) %>%
  fill(indic, .direction = 'down') %>% 
  filter(!is.na(indic)) %>% 
  select(-indic)

Output:

# A tibble: 10 x 2
# Groups:   id [3]
      id type  
   <dbl> <chr>
 1     1 yes  
 2     1 yes  
 3     1 no   
 4     2 yes  
 5     2 yes  
 6     2 no   
 7     3 yes  
 8     3 yes  
 9     3 no   
10     3 yes  

CodePudding user response:

You almost had it, i.e. use lead instead of lag and adjust your cumsum threshold a bit:

library(tidyverse)
DT %>%
  group_by(id) %>% 
  mutate(Tlead = lead(type),
         cumsum_yes = cumsum(type == "yes" & Tlead == "yes"),
         cumsum_yes = if_else(is.na(cumsum_yes), lag(cumsum_yes), cumsum_yes)) %>%
  filter(cumsum_yes >=1) %>%
  select(-cumsum_yes, -Tlead)

which gives:

# A tibble: 10 x 2
# Groups:   id [3]
      id type 
   <dbl> <chr>
 1     1 yes  
 2     1 yes  
 3     1 no   
 4     2 yes  
 5     2 yes  
 6     2 no   
 7     3 yes  
 8     3 yes  
 9     3 no   
10     3 yes 

Here's the result without deleting the temporary columns so that you can better see what's going on:

# A tibble: 10 x 4
# Groups:   id [3]
      id type  Tlead cumsum_yes
   <dbl> <chr> <chr>      <int>
 1     1 yes   yes            1
 2     1 yes   no             1
 3     1 no    NA             1
 4     2 yes   yes            1
 5     2 yes   no             1
 6     2 no    NA             1
 7     3 yes   yes            1
 8     3 yes   no             1
 9     3 no    yes            1
10     3 yes   NA             1

CodePudding user response:

Using rollapply from zoo you can look for any number of 'yes' value.

library(dplyr)
library(zoo)

n <- 2

DT %>%
  group_by(id) %>%
  filter(row_number() >= match(TRUE, 
           rollapply(type == 'yes', n, all, align = 'left', fill = NA))) %>%
  ungroup

#      id type 
#   <dbl> <chr>
# 1     1 yes  
# 2     1 yes  
# 3     1 no   
# 4     2 yes  
# 5     2 yes  
# 6     2 no   
# 7     3 yes  
# 8     3 yes  
# 9     3 no   
#10     3 yes  
  • Related