Home > database >  Remove non-last rows with certain condition per group
Remove non-last rows with certain condition per group

Time:11-29

I have the following dataframe called df (dput below):

  group indicator value
1     A     FALSE     2
2     A     FALSE     1
3     A     FALSE     2
4     A      TRUE     4
5     B     FALSE     5
6     B     FALSE     1
7     B      TRUE     3

I would like to remove the non-last rows with indicator == FALSE per group. This means that in df the rows: 1,2 and 5 should be removed because they are not the last rows with FALSE per group. Here is the desired output:

  group indicator value
1     A     FALSE     2
2     A      TRUE     4
3     B     FALSE     1
4     B      TRUE     3

So I was wondering if anyone knows how to remove non-last rows with certain condition per group in R?


dput of df:

df <- structure(list(group = c("A", "A", "A", "A", "B", "B", "B"), 
    indicator = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE
    ), value = c(2, 1, 2, 4, 5, 1, 3)), class = "data.frame", row.names = c(NA, 
-7L))

CodePudding user response:

You can do this with lead and check if the coming indicator is TRUE.

library(tidyverse)
df <- structure(list(group = c("A", "A", "A", "A", "B", "B", "B"), 
                     indicator = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE
                     ), value = c(2, 1, 2, 4, 5, 1, 3)), class = "data.frame", row.names = c(NA, 
                                                                                             -7L))
df |> 
  group_by(group) |> 
  mutate(slicer = if_else(lead(indicator) ==F, 1, 0)) |> 
  mutate(slicer = if_else(is.na(slicer), 0 , slicer)) |> 
  filter(slicer == 0) |> 
  select(-slicer)
#> # A tibble: 4 × 3
#> # Groups:   group [2]
#>   group indicator value
#>   <chr> <lgl>     <dbl>
#> 1 A     FALSE         2
#> 2 A     TRUE          4
#> 3 B     FALSE         1
#> 4 B     TRUE          3

CodePudding user response:

This may help:

library(dplyr)

df %>% 
  group_by(group) %>% 
  filter(indicator | ((row_number() == n() - 1) & !indicator))

# A tibble: 4 × 3
# Groups:   group [2]
  group indicator value
  <chr> <lgl>     <dbl>
1 A     FALSE         2
2 A     TRUE          4
3 B     FALSE         1
4 B     TRUE          3
  • Related