Home > Enterprise >  Remove sequence of rows conditional on value in single cell in group-first position
Remove sequence of rows conditional on value in single cell in group-first position


In this type of data:

df <- data.frame(
  Sequ = c(1,1,2,2,2,3,3,3),
  G = c("A", "B", "*", "B", "A", "A", "*", "B")

I need to filter out rows grouped by Sequ iff the Sequ-first value is *. I can do it like so, but was wondering if there's a more direct and more elegant way in dplyr:

df %>% 
  group_by(Sequ) %>%
  mutate(check = ifelse(first(G)=="*", 1, 0)) %>%
  filter(check != 1)
# A tibble: 5 × 3
# Groups:   Sequ [2]
   Sequ G     check
  <dbl> <chr> <dbl>
1     1 A         0
2     1 B         0
3     3 A         0
4     3 *         0
5     3 B         0

CodePudding user response:

We can try the following base R code using subset ave

  !ave(G == "*", Sequ, FUN = function(x) head(x, 1))

which gives

  Sequ G
1    1 A
2    1 B
6    3 A
7    3 *
8    3 B

CodePudding user response:

Another base R option with duplicated

subset(df, !Sequ %in% Sequ[G == "*" & !duplicated(Sequ)])
  Sequ G
1    1 A
2    1 B
6    3 A
7    3 *
8    3 B

CodePudding user response:

Here is a direct dplyr way:


df %>%
  group_by(Sequ) %>%
  filter(!first(G == "*"))
   Sequ G    
  <dbl> <chr>
1     1 A    
2     1 B    
3     3 A    
4     3 *    
5     3 B    
  • Related