Home > Enterprise >  remove row and its subsequent rows of a group after first occurrence of a value in a column (using d
remove row and its subsequent rows of a group after first occurrence of a value in a column (using d

Time:08-10

I found this: Remove subsequent rows of a group after first occurence of 0 in a column

and this: Remove rows after first occurrence of a certain value

But it's not quite what I am looking for. Here's a reprex:

df <- data.frame(
  id = c(1001, 1001, 1002, 1002, 1002, 1005, 1005, 1005, 1005, 1005),
  name = c("monkey", "gorilla", "chimp", "monkey", "giraffe", "tarzan", "whale", "princess", "phone", "kindle"),
  char = c(0, 1, 0, 1, 0, 0, 0, 1, 0, 0))
df
#>      id     name char
#> 1  1001   monkey    0
#> 2  1001  gorilla    1
#> 3  1002    chimp    0
#> 4  1002   monkey    1
#> 5  1002  giraffe    0
#> 6  1005   tarzan    0
#> 7  1005    whale    0
#> 8  1005 princess    1
#> 9  1005    phone    0
#> 10 1005   kindle    0

df_desired <- data.frame(
  id = c(1001, 1002, 1005, 1005),
  name = c("monkey", "chimp", "tarzan", "whale"),
  char = c(0, 0, 0, 0))
df_desired
#>     id     name char
#> 1 1001   monkey    0
#> 3 1002    chimp    0
#> 6 1005   tarzan    0
#> 7 1005    whale    0

Created on 2022-08-10 by the reprex package (v2.0.1)

I'm trying to remove the row and its subsequent rows after char hits 1, when grouped by id and arranged by name.

CodePudding user response:

Thanks for updating the details in your question @taimishu; if I've understood you correctly, here is a potential solution:

library(tidyverse)
df <- data.frame(
  id = c(1001, 1001, 1002, 1002, 1002, 1005, 1005, 1005, 1005, 1005),
  name = c("monkey", "gorilla", "chimp", "monkey", "giraffe", "tarzan", "whale", "princess", "phone", "kindle"),
  char = c(0, 1, 0, 1, 0, 0, 0, 1, 0, 0))
df
#>      id     name char
#> 1  1001   monkey    0
#> 2  1001  gorilla    1
#> 3  1002    chimp    0
#> 4  1002   monkey    1
#> 5  1002  giraffe    0
#> 6  1005   tarzan    0
#> 7  1005    whale    0
#> 8  1005 princess    1
#> 9  1005    phone    0
#> 10 1005   kindle    0

df_desired <- data.frame(
  id = c(1001, 1002, 1005, 1005),
  name = c("monkey", "chimp", "tarzan", "whale"),
  char = c(0, 0, 0, 0))
df_desired
#>     id   name char
#> 1 1001 monkey    0
#> 2 1002  chimp    0
#> 3 1005 tarzan    0
#> 4 1005  whale    0

df_filtered <- df %>%
  group_by(id) %>%
  filter(cummax(char) < 1)
df_filtered
#> # A tibble: 4 × 3
#> # Groups:   id [3]
#>      id name    char
#>   <dbl> <chr>  <dbl>
#> 1  1001 monkey     0
#> 2  1002 chimp      0
#> 3  1005 tarzan     0
#> 4  1005 whale      0

all_equal(df_desired, df_filtered)
#> [1] TRUE

Created on 2022-08-10 by the reprex package (v2.0.1)

CodePudding user response:

You could use cumany / cumall / cumsum:

library(dplyr)

df %>%
  group_by(id) %>%
  filter( ... ) %>%
  ungroup()

The ... part can be filled with

  • !cumany(char == 1)
  • cumall(char != 1)
  • !cumsum(char == 1)

All give

# A tibble: 4 × 3
     id name    char
  <dbl> <chr>  <dbl>
1  1001 monkey     0
2  1002 chimp      0
3  1005 tarzan     0
4  1005 whale      0
  • Related