I have a simple data frame such as
df <- data.frame(x=c(1,1,1,1,2,2,2,3,3,3),
y=c('a','b','a','c','e','d','e','a','f','c'))
I want to group by x, then if the first row of each x-groups has y == 'a', then get only rows that have y == 'a' | y == 'c'
So I expect the outcome would have row 1, 3, 4, 8, 10
Thank you very much.
CodePudding user response:
After grouping by 'x', create an &
condition - 1) check whether the first
value of 'y' is 'a', 2) condition that checks for values 'a', 'c' in the column
library(dplyr)
df %>%
group_by(x) %>%
filter('a' == first(y), y %in% c('a', 'c')) %>%
ungroup
-output
# A tibble: 5 × 2
x y
<dbl> <chr>
1 1 a
2 1 a
3 1 c
4 3 a
5 3 c
If we have additional rules, create a named list
where the names will be expected first values of 'y' and the vector of values to be filtered, then extract the list
element based on the first
value of the 'y' and use that vector in the logical expression with %in%
df %>%
group_by(x) %>%
filter(y %in% list(a = c('a', 'c'), e = 'e')[[first(y)]]) %>%
ungroup
-output
# A tibble: 7 × 2
x y
<dbl> <chr>
1 1 a
2 1 a
3 1 c
4 2 e
5 2 e
6 3 a
7 3 c
CodePudding user response:
Here is another dplyr
option
> df %>%
filter(y %in% c("a", "c") & ave(y == "a", x, FUN = first))
x y
1 1 a
2 1 a
3 1 c
4 3 a
5 3 c