Home > Net >  Filter out two consecutive values in group id
Filter out two consecutive values in group id

Time:10-04

I have the following data. I need to filter the group ids which have at least one yes but NOT Consecutive yes.

data <- data.frame(id=c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4, 5,5,5,5),
type=c('No','Yes','No',NA,'Yes','No','Yes','Yes',NA,'Yes','No','Yes','No','Yes',
                          NA,'No','Yes','Yes','No','No','No','No'))

Expected output:

   id type
1   1   No
2   1  Yes
3   1   No
4   1   NA
5   1  Yes
6   3   No
7   3  Yes
8   3   No
9   3  Yes
10  3   NA

I try it using

library (dplyr)
data1 <- data %>% group_by(id) %>% 
  filter((any((type %in% 'Yes'), na.rm = TRUE))) %>% 
 mutate(tlag= any(type== 'Yes' & lag(type == 'Yes')))%>%
  filter(!any(tlag==T)) %>% select(-tlag)
  ungroup 

CodePudding user response:

You can use two filtering conditions:

library(dplyr)
data %>% 
  group_by(id) %>% 
  filter(any(type == "Yes"),
         !any(type == "Yes" & lag(type, default = "No") == "Yes", na.rm = T))

output

      id type 
   <dbl> <chr>
 1     1 No   
 2     1 Yes  
 3     1 No   
 4     1 NA   
 5     1 Yes  
 6     3 No   
 7     3 Yes  
 8     3 No   
 9     3 Yes  
10     3 NA   

CodePudding user response:

We could check if rle values appear more than once with by.

by(data, data$id, \(x) 
   if (all(is.na(x$type)) || all(na.omit(x$type) == 'No') || 
       any(na.omit(with(rle(x$type), lengths[values == "Yes"])) > 1)) NULL
   else x) |> 
  do.call(what=rbind)
#      id type
# 1.1   1   No
# 1.2   1  Yes
# 1.3   1   No
# 1.4   1 <NA>
# 1.5   1  Yes
# 3.11  3   No
# 3.12  3  Yes
# 3.13  3   No
# 3.14  3  Yes
# 3.15  3 <NA>
  • Related