Delete the the observations by matching the two column values-CodePudding

I have the data df. I want to delete last observations after matching two column values i.e., cate=Yes ~ value=1.

df <- data.frame(id=c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,5,5,6,6,6,6,7,7,7,7,7),
       cate=c('No','Yes','Yes','No','Yes','No','Yes','Yes','Yes','No','No','No','Yes','Yes',
 'No','No','Yes','Yes','No',NA,'No','Yes','Yes','Yes','No','Yes','Yes','Yes','Yes'),
                 value=c(0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0))
df
   id cate value
1   1   No     0
2   1  Yes     0
3   1  Yes     0
4   1   No     0
5   1  Yes     0
6   2   No     0
7   2  Yes     1
8   2  Yes     0
9   2  Yes     0
10  2   No     0
11  3   No     0
12  3   No     0
13  3  Yes     0
14  3  Yes     0
15  3   No     0
16  4   No     0
17  4  Yes     0
18  4  Yes     0
19  5   No     0
20  5  Yes     0
21  6   No     0
22  6  Yes     1
23  6  Yes     0
24  6  Yes     0
25  7   No     0
26  7  Yes     1
27  7  Yes     1
28  7  Yes     0
29  7  Yes     0

I want to delete observations per group id after matching cate=Yes and value=1.

Then the expected output is

   id cate value
1   1   No     0
2   1  Yes     0
3   1  Yes     0
4   1   No     0
5   1  Yes     0
6   2   No     0
7   2  Yes     1
8   3   No     0
9   3   No     0
10  3  Yes     0
11  3  Yes     0
12  3   No     0
13  4   No     0
14  4  Yes     0
15  4  Yes     0
16  5   No     0
17  5  Yes     0
18  6   No     0
19  6  Yes     1
20  7   No     0
21  7  Yes     1

CodePudding user response：

We can use slice to select indices from 1 to the required row , taking care of NA , so we use coalesce with n() to select all rows which does not meet our condition .

library(dplyr)

df |> group_by(id) |> 
      slice(1:coalesce(which(cate == "Yes" & value == 1)[1] , n()))

Output

# A tibble: 21 × 3
# Groups:   id [7]
      id cate  value
   <dbl> <chr> <dbl>
 1     1 No        0
 2     1 Yes       0
 3     1 Yes       0
 4     1 No        0
 5     1 Yes       0
 6     2 No        0
 7     2 Yes       1
 8     3 No        0
 9     3 No        0
10     3 Yes       0
# … with 11 more rows

CodePudding user response：

We could group by 'id', get the cumulative sum of logical expression (cumsum), take the cumsum again, then filter the rows where the values are less than 2 (thus it will get the full row for some 'id' that doesn't have any match and the rows till the first match if there are)

library(dplyr)
df %>% 
  group_by(id) %>% 
  filter(cumsum(cumsum(cate == 'Yes' & value == 1))<= 1) %>%
  ungroup