Using dplyr I would like to summarise groups within a dataset using a conditional statement where the presence of two conditions within a triggers a TRUE value and all other permutations triggers a FALSE. It's best illustrated with an example. Say we have a dataset with several observations of a categorical variable within each id number
df <- data.frame(id = factor(c(1, 2, 2, 3, 3, 4, 4)),
l = factor(c("a", "a", "b", "a", "c", "b", "d")))
df
# id l
# 1 1 a
# 2 2 a
# 3 2 b
# 4 3 a
# 5 3 c
# 6 4 b
# 7 4 d
Now say I want a TRUE to occur only when an id group has BOTH a
AND c
within it.
I can create a conditional that returns TRUE if the id group has a
OR c
using the any()
function in dplyr
df %>%
group_by(id) %>%
summarise(ab = any(l %in% c("a", "c")))
# id ab
# <fct> <lgl>
# 1 1 TRUE
# 2 2 TRUE
# 3 3 TRUE
# 4 4 FALSE
In the documentation for any()
it said all()
does the opposite.
library(dplyr)
df %>%
group_by(id) %>%
summarise(ab = all(l %in% c("a", "c")))
# id ab
# <fct> <lgl>
# 1 1 TRUE
# 2 2 FALSE
# 3 3 TRUE
# 4 4 FALSE
This is close but not quite right because id number 1 has only one observation and so therefore cannot have both conditions.
Can anyone suggest a solution?
CodePudding user response:
Reverse the %in%
statement.
You want to know if "all" of c("a", "c")
are in the group, not whether all the group are in c("a", "c")
df %>%
group_by(id) %>%
summarise(ab = all(c("a", "c") %in% l))
#> # A tibble: 4 x 2
#> id ab
#> <fct> <lgl>
#> 1 1 FALSE
#> 2 2 FALSE
#> 3 3 TRUE
#> 4 4 FALSE