remove row and its subsequent rows of a group after first occurrence of a value in a column (using d-CodePudding

I found this: Remove subsequent rows of a group after first occurence of 0 in a column

and this: Remove rows after first occurrence of a certain value

But it's not quite what I am looking for. Here's a reprex:

df <- data.frame(
  id = c(1001, 1001, 1002, 1002, 1002, 1005, 1005, 1005, 1005, 1005),
  name = c("monkey", "gorilla", "chimp", "monkey", "giraffe", "tarzan", "whale", "princess", "phone", "kindle"),
  char = c(0, 1, 0, 1, 0, 0, 0, 1, 0, 0))
df
#>      id     name char
#> 1  1001   monkey    0
#> 2  1001  gorilla    1
#> 3  1002    chimp    0
#> 4  1002   monkey    1
#> 5  1002  giraffe    0
#> 6  1005   tarzan    0
#> 7  1005    whale    0
#> 8  1005 princess    1
#> 9  1005    phone    0
#> 10 1005   kindle    0

df_desired <- data.frame(
  id = c(1001, 1002, 1005, 1005),
  name = c("monkey", "chimp", "tarzan", "whale"),
  char = c(0, 0, 0, 0))
df_desired
#>     id     name char
#> 1 1001   monkey    0
#> 3 1002    chimp    0
#> 6 1005   tarzan    0
#> 7 1005    whale    0

^{Created on 2022-08-10 by the reprex package (v2.0.1)}

I'm trying to remove the row and its subsequent rows after char hits 1, when grouped by id and arranged by name.

CodePudding user response：

Thanks for updating the details in your question @taimishu; if I've understood you correctly, here is a potential solution:

library(tidyverse)
df <- data.frame(
  id = c(1001, 1001, 1002, 1002, 1002, 1005, 1005, 1005, 1005, 1005),
  name = c("monkey", "gorilla", "chimp", "monkey", "giraffe", "tarzan", "whale", "princess", "phone", "kindle"),
  char = c(0, 1, 0, 1, 0, 0, 0, 1, 0, 0))
df
#>      id     name char
#> 1  1001   monkey    0
#> 2  1001  gorilla    1
#> 3  1002    chimp    0
#> 4  1002   monkey    1
#> 5  1002  giraffe    0
#> 6  1005   tarzan    0
#> 7  1005    whale    0
#> 8  1005 princess    1
#> 9  1005    phone    0
#> 10 1005   kindle    0

df_desired <- data.frame(
  id = c(1001, 1002, 1005, 1005),
  name = c("monkey", "chimp", "tarzan", "whale"),
  char = c(0, 0, 0, 0))
df_desired
#>     id   name char
#> 1 1001 monkey    0
#> 2 1002  chimp    0
#> 3 1005 tarzan    0
#> 4 1005  whale    0

df_filtered <- df %>%
  group_by(id) %>%
  filter(cummax(char) < 1)
df_filtered
#> # A tibble: 4 × 3
#> # Groups:   id [3]
#>      id name    char
#>   <dbl> <chr>  <dbl>
#> 1  1001 monkey     0
#> 2  1002 chimp      0
#> 3  1005 tarzan     0
#> 4  1005 whale      0

all_equal(df_desired, df_filtered)
#> [1] TRUE

^{Created on 2022-08-10 by the reprex package (v2.0.1)}

CodePudding user response：

You could use cumany / cumall / cumsum:

library(dplyr)

df %>%
  group_by(id) %>%
  filter( ... ) %>%
  ungroup()

The ... part can be filled with

!cumany(char == 1)
cumall(char != 1)
!cumsum(char == 1)

All give

# A tibble: 4 × 3
     id name    char
  <dbl> <chr>  <dbl>
1  1001 monkey     0
2  1002 chimp      0
3  1005 tarzan     0
4  1005 whale      0