Filter up to observing first specific value-CodePudding

I try to filter only the first type=="y" value observed.

df<-data.frame(id=c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,3,3),
type=c("x","x","y","x","x","x","x","y","y","x","x","x","x","y","x","y","x","x"))

Desired output:

I try it with the code:

library(dplyr)
df %>% filter(type == "x"|sum(type == "y") == 1) %>%
  ungroup

CodePudding user response：

You may take help of match and use it in slice -

library(dplyr)
df %>% group_by(id) %>% slice(1:match('y', type)) %>% ungroup

#     id type 
#   <dbl> <chr>
# 1     1 x    
# 2     1 x    
# 3     1 y    
# 4     2 x    
# 5     2 x    
# 6     2 x    
# 7     2 y    
# 8     3 x    
# 9     3 x    
#10     3 x    
#11     3 x    
#12     3 y

match would return the position of first 'y' in each group. This however, would fail if there are no 'y' in type column for a id. If that is the case then you can use filter like below which would return all the rows for the group if there is no 'y' in it.

df %>%
  group_by(id) %>%
  filter(lag(cumsum(type == 'y'), default = 0) < 1)
  ungroup

CodePudding user response：

If you meant to filter the first observation with type == "y" of each id, you can just use the basic duplicated function to do the job.

df1 <- df[!duplicated(df[, c("id", "type")]), ] %>% filter(type == "y")

CodePudding user response：

Here is a one-liner using by and which.max.

do.call(rbind, by(df, df$id, \(x) x[1:which.max(x$type == 'y'), ]))
#      id type
# 1.1   1    x
# 1.2   1    x
# 1.3   1    y
# 2.5   2    x
# 2.6   2    x
# 2.7   2    x
# 2.8   2    y
# 3.10  3    x
# 3.11  3    x
# 3.12  3    x
# 3.13  3    x
# 3.14  3    y