I have a dataset with groups--"A", "B", "C", and "A & B"--at two time points--"before" and "after". I only want to include "A & B" if the any of the sample sizes for A or B at either time point fall below 10 people. Otherwise, I want to drop the "A & B" group. How do I tell R to drop this group only if the other criteria are satisfied?
Here's are two sample datasets--one where it should filter out group A & B and one where it should retain it:
library(dplyr)
#This should not filter out anything
should_not_drop_group <- tibble(group = rep(c("A", "B", "C", "A & B"), 2),
time = c(rep(c("Before"), 4), rep(c("After"), 4)),
sample_size = c(5, 100, 132, 105, 250, 50, 224, 300))
#This dataset should drop group A&B
should_drop_group <- tibble(group = rep(c("A", "B", "C", "A & B"), 2),
time = c(rep(c("Before"), 4), rep(c("After"), 4)),
sample_size = c(500, 100, 132, 600, 250, 50, 224, 300))
And here's why I tried to no avail:
library(dplyr)
should_drop_group %>%
filter_if(~any(sample_size[group %in% c("A", "B")] < 10), group != "A & B" )
CodePudding user response:
Maybe the condition in filter
would be - subset the group
where the sample_size
is less than 10, check if there are any
values of 'A', 'B' in that group, negate (!
), then create the second expression where group
is "A & B", join them with &
, and then negate (!
) the whole expression to filter out those cases
library(dplyr)
should_not_drop_group %>%
filter(!(!any(c("A", "B") %in% group[sample_size < 10]) & group == "A & B"))
# or can be written as
#filter(!(!any(group %in% c("A", "B") & sample_size < 10) & group == "A & B"))
-output
# A tibble: 8 × 3
group time sample_size
<chr> <chr> <dbl>
1 A Before 5
2 B Before 100
3 C Before 132
4 A & B Before 105
5 A After 250
6 B After 50
7 C After 224
8 A & B After 300
and second case
should_drop_group %>%
filter(!(!any(c("A", "B") %in% group[sample_size < 10]) & group == "A & B"))
# A tibble: 6 × 3
group time sample_size
<chr> <chr> <dbl>
1 A Before 500
2 B Before 100
3 C Before 132
4 A After 250
5 B After 50
6 C After 224
If we want to reuse it on several datasets, create a function and reuse it
> f1 <- function(x, sample_size)
!(!any(c("A", "B") %in% x[sample_size < 10]) & x == "A & B")
> should_not_drop_group %>%
filter(if_any(group, f1, sample_size = sample_size))
# A tibble: 8 × 3
group time sample_size
<chr> <chr> <dbl>
1 A Before 5
2 B Before 100
3 C Before 132
4 A & B Before 105
5 A After 250
6 B After 50
7 C After 224
8 A & B After 300
> should_drop_group %>%
filter(if_any(group, f1, sample_size = sample_size))
# A tibble: 6 × 3
group time sample_size
<chr> <chr> <dbl>
1 A Before 500
2 B Before 100
3 C Before 132
4 A After 250
5 B After 50
6 C After 224
CodePudding user response:
Here is a solution with an ifelse
statement and a helper column x
:
library(dplyr)
df %>%
#df1 %>%
mutate(x = ifelse(any(sample_size < 10) & group == "A & B", 1, 0)) %>%
filter(x!=1) %>%
select(-x)
for df:
group time sample_size
<chr> <chr> <dbl>
1 A Before 500
2 B Before 100
3 C Before 132
4 A & B Before 600
5 A After 250
6 B After 50
7 C After 224
8 A & B After 300
for df1
group time sample_size
<chr> <chr> <dbl>
1 A Before 5
2 B Before 100
3 C Before 132
4 A After 250
5 B After 50
6 C After 224