I try to filter only the first type=="y" value observed.
df<-data.frame(id=c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,3,3),
type=c("x","x","y","x","x","x","x","y","y","x","x","x","x","y","x","y","x","x"))
Desired output:
id type
1 x
1 x
1 y
2 x
2 x
2 x
2 y
3 x
3 x
3 x
3 x
3 y
I try it with the code:
library(dplyr)
df %>% filter(type == "x"|sum(type == "y") == 1) %>%
ungroup
CodePudding user response:
You may take help of match
and use it in slice
-
library(dplyr)
df %>% group_by(id) %>% slice(1:match('y', type)) %>% ungroup
# id type
# <dbl> <chr>
# 1 1 x
# 2 1 x
# 3 1 y
# 4 2 x
# 5 2 x
# 6 2 x
# 7 2 y
# 8 3 x
# 9 3 x
#10 3 x
#11 3 x
#12 3 y
match
would return the position of first 'y'
in each group. This however, would fail if there are no 'y'
in type
column for a id
. If that is the case then you can use filter
like below which would return all the rows for the group if there is no 'y'
in it.
df %>%
group_by(id) %>%
filter(lag(cumsum(type == 'y'), default = 0) < 1)
ungroup
CodePudding user response:
If you meant to filter the first observation with type == "y" of each id, you can just use the basic duplicated
function to do the job.
df1 <- df[!duplicated(df[, c("id", "type")]), ] %>% filter(type == "y")
CodePudding user response:
Here is a one-liner using by
and which.max
.
do.call(rbind, by(df, df$id, \(x) x[1:which.max(x$type == 'y'), ]))
# id type
# 1.1 1 x
# 1.2 1 x
# 1.3 1 y
# 2.5 2 x
# 2.6 2 x
# 2.7 2 x
# 2.8 2 y
# 3.10 3 x
# 3.11 3 x
# 3.12 3 x
# 3.13 3 x
# 3.14 3 y