How to keep or remove *all* rows by group if a condition is met anywhere in the group-CodePudding

I have a data frame with column A (Sample_id), column B (Prey), and column C (Count). Column B has 3 factors: P1, P2, P3. I want to remove all Sample_id's where P3 does not occur - in other words, retain only the Sample_id's where P3 does occur.

Essentially, I want to go from this original dataset :

df_orig <- data.frame(Sample_id = rep(c('S1', 'S2', 'S3'), each = 3),
                   Prey = rep(c('P1', 'P2', 'P3'), times = 3),
                   Count = (c(10, 16, 0, 5, 0, 0, 6, 2, 9)))

to this reduced dataset:

df_red <- data.frame(Sample_id = rep(c('S3'), each = 3),
                   Prey = rep(c('P1', 'P2', 'P3'), times = 1),
                   Count = (c(6, 2, 9)))

I think I should be able to achieve this with dplyr filter somehow, but my attempts (see below) removes the prey groups P2 and P1. Rather, I need to filter where a condition is met (i.e. retain Sample_id where P3 occurs).

How do I do this?

library(dplyr)
df_red <- df_orig %>%
  group_by(Sample_id) %>%
  filter(Prey == "P3") %>%
  ungroup()

CodePudding user response：

You need to incorporate Count into your condition. And to keep all rows for a given Sample_id if the condition is met in any row for that Sample_id, wrap the condition in any().

library(dplyr)

df_red <- df_orig %>%
  group_by(Sample_id) %>%
  filter(any(Prey == "P3" & Count > 0)) %>%
  ungroup()
  
df_red

# A tibble: 3 × 3
  Sample_id Prey  Count
  <chr>     <chr> <dbl>
1 S3        P1        6
2 S3        P2        2
3 S3        P3        9

CodePudding user response：

The following works to get from your original to your desired dataset:

df_orig %>%
   filter(Sample_id == "S3")

Is this all you are trying to do or is there something else you're trying to achieve?