I have a data frame with column A (Sample_id), column B (Prey), and column C (Count). Column B has 3 factors: P1, P2, P3. I want to remove all Sample_id's where P3 does not occur - in other words, retain only the Sample_id's where P3 does occur.
Essentially, I want to go from this original dataset :
df_orig <- data.frame(Sample_id = rep(c('S1', 'S2', 'S3'), each = 3),
Prey = rep(c('P1', 'P2', 'P3'), times = 3),
Count = (c(10, 16, 0, 5, 0, 0, 6, 2, 9)))
to this reduced dataset:
df_red <- data.frame(Sample_id = rep(c('S3'), each = 3),
Prey = rep(c('P1', 'P2', 'P3'), times = 1),
Count = (c(6, 2, 9)))
I think I should be able to achieve this with dplyr filter somehow, but my attempts (see below) removes the prey groups P2 and P1. Rather, I need to filter where a condition is met (i.e. retain Sample_id where P3 occurs).
How do I do this?
library(dplyr)
df_red <- df_orig %>%
group_by(Sample_id) %>%
filter(Prey == "P3") %>%
ungroup()
CodePudding user response:
You need to incorporate Count
into your condition. And to keep all rows for a given Sample_id
if the condition is met in any row for that Sample_id
, wrap the condition in any()
.
library(dplyr)
df_red <- df_orig %>%
group_by(Sample_id) %>%
filter(any(Prey == "P3" & Count > 0)) %>%
ungroup()
df_red
# A tibble: 3 × 3
Sample_id Prey Count
<chr> <chr> <dbl>
1 S3 P1 6
2 S3 P2 2
3 S3 P3 9
CodePudding user response:
The following works to get from your original to your desired dataset:
df_orig %>%
filter(Sample_id == "S3")
Is this all you are trying to do or is there something else you're trying to achieve?